Overview

Brought to you by YData

Dataset statistics

 curated_md_reportconcatenated_md_report
Number of variables3724
Number of observations2188122588
Missing cells344985233282
Missing cells (%)42.6%43.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory6.2 MiB4.1 MiB
Average record size in memory296.0 B192.0 B

Variable types

 curated_md_reportconcatenated_md_report
Text97
Numeric33
Categorical2312
Boolean22

Alerts

curated_md_reportconcatenated_md_report
age_group is highly overall correlated with age_group_ontology_term_id and 4 other fieldsage_group is highly overall correlated with age_max and 3 other fieldsHigh correlation
age_group_ontology_term_id is highly overall correlated with age_group and 4 other fieldsAlert not present in this datasetHigh correlation
age_max is highly overall correlated with age_group and 5 other fieldsage_max is highly overall correlated with age_group and 2 other fieldsHigh correlation
age_min is highly overall correlated with age_group and 5 other fieldsage_min is highly overall correlated with age_group and 2 other fieldsHigh correlation
age_years is highly overall correlated with age_group and 7 other fieldsage_years is highly overall correlated with age_group and 3 other fieldsHigh correlation
antibiotics_current_use is highly overall correlated with fmt_id and 3 other fieldsantibiotics_current_use is highly overall correlated with fmt_id and 2 other fieldsHigh correlation
body_site is highly overall correlated with body_site_ontology_term_id and 9 other fieldsbody_site is highly overall correlated with dietary_restriction and 4 other fieldsHigh correlation
body_site_ontology_term_id is highly overall correlated with body_site and 9 other fieldsAlert not present in this datasetHigh correlation
control is highly overall correlated with control_ontology_term_id and 5 other fieldscontrol is highly overall correlated with fmt_id and 3 other fieldsHigh correlation
control_ontology_term_id is highly overall correlated with control and 5 other fieldsAlert not present in this datasetHigh correlation
country is highly overall correlated with country_ontology_term_id and 10 other fieldscountry is highly overall correlated with dietary_restriction and 5 other fieldsHigh correlation
country_ontology_term_id is highly overall correlated with country and 10 other fieldsAlert not present in this datasetHigh correlation
dietary_restriction is highly overall correlated with body_site and 9 other fieldsdietary_restriction is highly overall correlated with body_site and 5 other fieldsHigh correlation
feces_phenotype_metric is highly overall correlated with age_years and 10 other fieldsfeces_phenotype_metric is highly overall correlated with age_years and 5 other fieldsHigh correlation
feces_phenotype_metric_ontology_term_id is highly overall correlated with age_years and 10 other fieldsAlert not present in this datasetHigh correlation
fmt_id is highly overall correlated with age_group and 13 other fieldsfmt_id is highly overall correlated with age_group and 8 other fieldsHigh correlation
fmt_role is highly overall correlated with body_site and 6 other fieldsAlert not present in this datasetHigh correlation
hla is highly overall correlated with age_max and 10 other fieldsAlert not present in this datasetHigh correlation
hla_ontology_term_id is highly overall correlated with age_max and 10 other fieldsAlert not present in this datasetHigh correlation
sex is highly overall correlated with hla and 2 other fieldsAlert not present in this datasetHigh correlation
sex_ontology_term_id is highly overall correlated with hla and 2 other fieldsAlert not present in this datasetHigh correlation
smoker is highly overall correlated with country and 7 other fieldssmoker is highly overall correlated with country and 3 other fieldsHigh correlation
smoker_ontology_term_id is highly overall correlated with country and 7 other fieldsAlert not present in this datasetHigh correlation
target_condition is highly overall correlated with antibiotics_current_use and 15 other fieldstarget_condition is highly overall correlated with antibiotics_current_use and 7 other fieldsHigh correlation
target_condition_ontology_term_id is highly overall correlated with antibiotics_current_use and 15 other fieldsAlert not present in this datasetHigh correlation
tumor_staging_ajcc is highly overall correlated with body_site and 6 other fieldstumor_staging_ajcc is highly overall correlated with body_site and 4 other fieldsHigh correlation
tumor_staging_tnm is highly overall correlated with antibiotics_current_use and 6 other fieldstumor_staging_tnm is highly overall correlated with antibiotics_current_use and 4 other fieldsHigh correlation
westernized is highly overall correlated with country and 12 other fieldswesternized is highly overall correlated with country and 6 other fieldsHigh correlation
body_site is highly imbalanced (81.2%) body_site is highly imbalanced (81.7%) Imbalance
body_site_ontology_term_id is highly imbalanced (81.2%) Alert not present in this datasetImbalance
westernized is highly imbalanced (68.3%) westernized is highly imbalanced (69.0%) Imbalance
age_years has 8409 (38.4%) missing values age_years has 9035 (40.0%) missing values Missing
biomarker has 18841 (86.1%) missing values biomarker has 19548 (86.5%) missing values Missing
dietary_restriction has 21464 (98.1%) missing values dietary_restriction has 22171 (98.2%) missing values Missing
feces_phenotype_metric has 20784 (95.0%) missing values feces_phenotype_metric has 21491 (95.1%) missing values Missing
feces_phenotype_value has 20784 (95.0%) missing values feces_phenotype_value has 21491 (95.1%) missing values Missing
feces_phenotype_metric_ontology_term_id has 20784 (95.0%) missing values Alert not present in this datasetMissing
fmt_role has 21725 (99.3%) missing values Alert not present in this datasetMissing
fmt_id has 21736 (99.3%) missing values fmt_id has 22443 (99.4%) missing values Missing
sex has 2558 (11.7%) missing values sex has 2558 (11.3%) missing values Missing
sex_ontology_term_id has 2558 (11.7%) missing values Alert not present in this datasetMissing
hla has 20981 (95.9%) missing values Alert not present in this datasetMissing
hla_ontology_term_id has 20981 (95.9%) missing values Alert not present in this datasetMissing
smoker has 18901 (86.4%) missing values smoker has 19608 (86.8%) missing values Missing
smoker_ontology_term_id has 18901 (86.4%) missing values Alert not present in this datasetMissing
antibiotics_current_use has 7306 (33.4%) missing values antibiotics_current_use has 7932 (35.1%) missing values Missing
treatment has 19534 (89.3%) missing values treatment has 20241 (89.6%) missing values Missing
treatment_ontology_term_id has 16053 (73.4%) missing values Alert not present in this datasetMissing
tumor_staging_ajcc has 21252 (97.1%) missing values tumor_staging_ajcc has 21959 (97.2%) missing values Missing
tumor_staging_tnm has 21619 (98.8%) missing values tumor_staging_tnm has 22326 (98.8%) missing values Missing
unmetadata has 19810 (90.5%) missing values unmetadata has 20517 (90.8%) missing values Missing
Alert not present in this datasetage_min has 627 (2.8%) missing values Missing
Alert not present in this datasetage_max has 627 (2.8%) missing values Missing
Alert not present in this datasetcontrol has 707 (3.1%) missing values Missing

Reproduction

 curated_md_reportconcatenated_md_report
Analysis started2025-03-31 03:31:04.1547182025-03-31 03:31:10.895815
Analysis finished2025-03-31 03:31:10.8872662025-03-31 03:31:14.891206
Duration6.73 seconds4 seconds
Software versionydata-profiling vv4.16.1ydata-profiling vv4.16.1
Download configurationconfig.jsonconfig.json

Variables

study_name
['Text', 'Text']

 curated_md_reportconcatenated_md_report
Distinct9093
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:15.743841image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 curated_md_reportconcatenated_md_report
Max length2222
Median length1919
Mean length12.77341112.756242
Min length88

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters279495288138
Distinct characters6161
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique00 ?
Unique (%)0.0%0.0%

Sample

 curated_md_reportconcatenated_md_report
1st rowAsnicarF_2017AsnicarF_2017
2nd rowAsnicarF_2017AsnicarF_2017
3rd rowAsnicarF_2017AsnicarF_2017
4th rowAsnicarF_2017AsnicarF_2017
5th rowAsnicarF_2017AsnicarF_2017
ValueCountFrequency (%)
metacardis_2020_a 1831
 
8.4%
shaoy_2019 1644
 
7.5%
hmp_2019_ibdmdb 1627
 
7.4%
lifelinesdeep_2016 1135
 
5.2%
asnicarf_2021 1098
 
5.0%
mehtars_2018 928
 
4.2%
zeevid_2015 900
 
4.1%
vatanent_2016 785
 
3.6%
hmp_2012 748
 
3.4%
yachidas_2019 616
 
2.8%
Other values (80) 10569
48.3%
ValueCountFrequency (%)
metacardis_2020_a 1831
 
8.1%
shaoy_2019 1644
 
7.3%
hmp_2019_ibdmdb 1627
 
7.2%
lifelinesdeep_2016 1135
 
5.0%
asnicarf_2021 1098
 
4.9%
mehtars_2018 928
 
4.1%
zeevid_2015 900
 
4.0%
vatanent_2016 785
 
3.5%
hmp_2012 748
 
3.3%
yachidas_2019 616
 
2.7%
Other values (83) 11276
49.9%
2025-03-30T23:31:16.147148image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 27391
 
9.8%
_ 25877
 
9.3%
0 24226
 
8.7%
1 19206
 
6.9%
e 16920
 
6.1%
a 16355
 
5.9%
i 13607
 
4.9%
n 7790
 
2.8%
s 7329
 
2.6%
r 6667
 
2.4%
Other values (51) 114127
40.8%
ValueCountFrequency (%)
2 28098
 
9.8%
_ 26584
 
9.2%
0 24933
 
8.7%
1 19913
 
6.9%
a 17143
 
5.9%
e 16920
 
5.9%
i 14043
 
4.9%
n 8033
 
2.8%
s 7871
 
2.7%
r 6938
 
2.4%
Other values (51) 117662
40.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 279495
100.0%
ValueCountFrequency (%)
(unknown) 288138
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 27391
 
9.8%
_ 25877
 
9.3%
0 24226
 
8.7%
1 19206
 
6.9%
e 16920
 
6.1%
a 16355
 
5.9%
i 13607
 
4.9%
n 7790
 
2.8%
s 7329
 
2.6%
r 6667
 
2.4%
Other values (51) 114127
40.8%
ValueCountFrequency (%)
2 28098
 
9.8%
_ 26584
 
9.2%
0 24933
 
8.7%
1 19913
 
6.9%
a 17143
 
5.9%
e 16920
 
5.9%
i 14043
 
4.9%
n 8033
 
2.8%
s 7871
 
2.7%
r 6938
 
2.4%
Other values (51) 117662
40.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 279495
100.0%
ValueCountFrequency (%)
(unknown) 288138
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 27391
 
9.8%
_ 25877
 
9.3%
0 24226
 
8.7%
1 19206
 
6.9%
e 16920
 
6.1%
a 16355
 
5.9%
i 13607
 
4.9%
n 7790
 
2.8%
s 7329
 
2.6%
r 6667
 
2.4%
Other values (51) 114127
40.8%
ValueCountFrequency (%)
2 28098
 
9.8%
_ 26584
 
9.2%
0 24933
 
8.7%
1 19913
 
6.9%
a 17143
 
5.9%
e 16920
 
5.9%
i 14043
 
4.9%
n 8033
 
2.8%
s 7871
 
2.7%
r 6938
 
2.4%
Other values (51) 117662
40.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 279495
100.0%
ValueCountFrequency (%)
(unknown) 288138
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 27391
 
9.8%
_ 25877
 
9.3%
0 24226
 
8.7%
1 19206
 
6.9%
e 16920
 
6.1%
a 16355
 
5.9%
i 13607
 
4.9%
n 7790
 
2.8%
s 7329
 
2.6%
r 6667
 
2.4%
Other values (51) 114127
40.8%
ValueCountFrequency (%)
2 28098
 
9.8%
_ 26584
 
9.2%
0 24933
 
8.7%
1 19913
 
6.9%
a 17143
 
5.9%
e 16920
 
5.9%
i 14043
 
4.9%
n 8033
 
2.8%
s 7871
 
2.7%
r 6938
 
2.4%
Other values (51) 117662
40.8%

sample_id
['Text', 'Text']

 curated_md_reportconcatenated_md_report
Distinct2170422411
Distinct (%)99.2%99.2%
Missing00
Missing (%)0.0%0.0%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:16.516990image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 curated_md_reportconcatenated_md_report
Max length5757
Median length5454
Mean length14.42287814.524172
Min length22

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters315587328072
Distinct characters6565
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique2152722234 ?
Unique (%)98.4%98.4%

Sample

 curated_md_reportconcatenated_md_report
1st rowMV_FEI1_t1Q14MV_FEI1_t1Q14
2nd rowMV_FEI2_t1Q14MV_FEI2_t1Q14
3rd rowMV_FEI3_t1Q14MV_FEI3_t1Q14
4th rowMV_FEI4_t1Q14MV_FEI4_t1Q14
5th rowMV_FEI4_t2Q15MV_FEI4_t2Q15
ValueCountFrequency (%)
mh0039 2
 
< 0.1%
mh0081 2
 
< 0.1%
mh0059 2
 
< 0.1%
mh0048 2
 
< 0.1%
mh0126 2
 
< 0.1%
mh0040 2
 
< 0.1%
mh0041 2
 
< 0.1%
mh0042 2
 
< 0.1%
mh0043 2
 
< 0.1%
mh0044 2
 
< 0.1%
Other values (21694) 21861
99.9%
ValueCountFrequency (%)
mh0132 2
 
< 0.1%
mh0127 2
 
< 0.1%
mh0143 2
 
< 0.1%
mh0144 2
 
< 0.1%
mh0145 2
 
< 0.1%
mh0146 2
 
< 0.1%
mh0148 2
 
< 0.1%
mh0139 2
 
< 0.1%
mh0149 2
 
< 0.1%
mh0079 2
 
< 0.1%
Other values (22401) 22568
99.9%
2025-03-30T23:31:17.035119image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 40630
 
12.9%
1 26414
 
8.4%
2 16856
 
5.3%
6 15293
 
4.8%
4 14314
 
4.5%
9 14047
 
4.5%
S 13725
 
4.3%
M 13660
 
4.3%
7 13500
 
4.3%
3 12757
 
4.0%
Other values (55) 134391
42.6%
ValueCountFrequency (%)
0 44533
 
13.6%
1 27806
 
8.5%
2 17408
 
5.3%
6 16037
 
4.9%
4 14743
 
4.5%
9 14265
 
4.3%
7 14162
 
4.3%
M 13741
 
4.2%
S 13725
 
4.2%
3 13528
 
4.1%
Other values (55) 138124
42.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 315587
100.0%
ValueCountFrequency (%)
(unknown) 328072
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 40630
 
12.9%
1 26414
 
8.4%
2 16856
 
5.3%
6 15293
 
4.8%
4 14314
 
4.5%
9 14047
 
4.5%
S 13725
 
4.3%
M 13660
 
4.3%
7 13500
 
4.3%
3 12757
 
4.0%
Other values (55) 134391
42.6%
ValueCountFrequency (%)
0 44533
 
13.6%
1 27806
 
8.5%
2 17408
 
5.3%
6 16037
 
4.9%
4 14743
 
4.5%
9 14265
 
4.3%
7 14162
 
4.3%
M 13741
 
4.2%
S 13725
 
4.2%
3 13528
 
4.1%
Other values (55) 138124
42.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 315587
100.0%
ValueCountFrequency (%)
(unknown) 328072
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 40630
 
12.9%
1 26414
 
8.4%
2 16856
 
5.3%
6 15293
 
4.8%
4 14314
 
4.5%
9 14047
 
4.5%
S 13725
 
4.3%
M 13660
 
4.3%
7 13500
 
4.3%
3 12757
 
4.0%
Other values (55) 134391
42.6%
ValueCountFrequency (%)
0 44533
 
13.6%
1 27806
 
8.5%
2 17408
 
5.3%
6 16037
 
4.9%
4 14743
 
4.5%
9 14265
 
4.3%
7 14162
 
4.3%
M 13741
 
4.2%
S 13725
 
4.2%
3 13528
 
4.1%
Other values (55) 138124
42.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 315587
100.0%
ValueCountFrequency (%)
(unknown) 328072
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 40630
 
12.9%
1 26414
 
8.4%
2 16856
 
5.3%
6 15293
 
4.8%
4 14314
 
4.5%
9 14047
 
4.5%
S 13725
 
4.3%
M 13660
 
4.3%
7 13500
 
4.3%
3 12757
 
4.0%
Other values (55) 134391
42.6%
ValueCountFrequency (%)
0 44533
 
13.6%
1 27806
 
8.5%
2 17408
 
5.3%
6 16037
 
4.9%
4 14743
 
4.5%
9 14265
 
4.3%
7 14162
 
4.3%
M 13741
 
4.2%
S 13725
 
4.2%
3 13528
 
4.1%
Other values (55) 138124
42.1%

age_years
Real number (ℝ)

 curated_md_reportconcatenated_md_report
Distinct464465
Distinct (%)3.4%3.4%
Missing84099035
Missing (%)38.4%40.0%
Infinite00
Infinite (%)0.0%0.0%
Mean32.74214132.896563
 curated_md_reportconcatenated_md_report
Minimum00
Maximum9292
Zeros4343
Zeros (%)0.2%0.2%
Negative00
Negative (%)0.0%0.0%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:17.162248image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 curated_md_reportconcatenated_md_report
Minimum00
5-th percentile0.0191780820.019178082
Q166
median3232
Q35454
95-th percentile7171
Maximum9292
Range9292
Interquartile range (IQR)4848

Descriptive statistics

 curated_md_reportconcatenated_md_report
Standard deviation24.7975324.817257
Coefficient of variation (CV)0.757358230.75440274
Kurtosis-1.2321976-1.2354848
Mean32.74214132.896563
Median Absolute Deviation (MAD)2323
Skewness0.0933066080.08417056
Sum441102.12445847.12
Variance614.91748615.89626
MonotonicityNot monotonicNot monotonic
2025-03-30T23:31:17.564854image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.01917808219 539
 
2.5%
1 418
 
1.9%
0.05753424658 335
 
1.5%
0.01095890411 316
 
1.4%
51 280
 
1.3%
23 243
 
1.1%
26 232
 
1.1%
32 231
 
1.1%
27 226
 
1.0%
50 224
 
1.0%
Other values (454) 10428
47.7%
(Missing) 8409
38.4%
ValueCountFrequency (%)
0.01917808219 539
 
2.4%
1 418
 
1.9%
0.05753424658 335
 
1.5%
0.01095890411 316
 
1.4%
51 287
 
1.3%
23 243
 
1.1%
26 232
 
1.0%
32 231
 
1.0%
50 226
 
1.0%
27 226
 
1.0%
Other values (455) 10500
46.5%
(Missing) 9035
40.0%
ValueCountFrequency (%)
0 43
 
0.2%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.2%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.5%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%
ValueCountFrequency (%)
0 43
 
0.2%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.1%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.4%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%
ValueCountFrequency (%)
0 43
 
0.2%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.2%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.5%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%
ValueCountFrequency (%)
0 43
 
0.2%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.1%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.4%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%

age_min
Real number (ℝ)

 curated_md_reportconcatenated_md_report
Distinct464465
Distinct (%)2.1%2.1%
Missing1627
Missing (%)< 0.1%2.8%
Infinite00
Infinite (%)0.0%0.0%
Mean28.48935728.600342
 curated_md_reportconcatenated_md_report
Minimum00
Maximum9292
Zeros187187
Zeros (%)0.9%0.8%
Negative00
Negative (%)0.0%0.0%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:17.696937image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 curated_md_reportconcatenated_md_report
Minimum00
5-th percentile0.0191780820.019178082
Q11818
median1818
Q34646
95-th percentile6969
Maximum9292
Range9292
Interquartile range (IQR)2828

Descriptive statistics

 curated_md_reportconcatenated_md_report
Standard deviation21.92053821.965644
Coefficient of variation (CV)0.769429020.7680203
Kurtosis-0.6680911-0.68487874
Mean28.48935728.600342
Median Absolute Deviation (MAD)1414
Skewness0.644774060.63645516
Sum623347.12628092.12
Variance480.50998482.4895
MonotonicityNot monotonicNot monotonic
2025-03-30T23:31:17.823297image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18 7465
34.1%
65 922
 
4.2%
0.01917808219 539
 
2.5%
1 418
 
1.9%
0.05753424658 335
 
1.5%
0.01095890411 316
 
1.4%
51 280
 
1.3%
23 243
 
1.1%
26 232
 
1.1%
32 231
 
1.1%
Other values (454) 10899
49.8%
ValueCountFrequency (%)
18 7465
33.0%
65 926
 
4.1%
0.01917808219 539
 
2.4%
1 418
 
1.9%
0.05753424658 335
 
1.5%
0.01095890411 316
 
1.4%
51 287
 
1.3%
23 243
 
1.1%
26 232
 
1.0%
32 231
 
1.0%
Other values (455) 10969
48.6%
(Missing) 627
 
2.8%
ValueCountFrequency (%)
0 187
 
0.9%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.2%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.5%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%
ValueCountFrequency (%)
0 187
 
0.8%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.1%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.4%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%
ValueCountFrequency (%)
0 187
 
0.9%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.2%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.5%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%
ValueCountFrequency (%)
0 187
 
0.8%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.1%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.4%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%

age_max
Real number (ℝ)

 curated_md_reportconcatenated_md_report
Distinct465466
Distinct (%)2.1%2.1%
Missing1627
Missing (%)< 0.1%2.8%
Infinite00
Infinite (%)0.0%0.0%
Mean46.68058146.724472
 curated_md_reportconcatenated_md_report
Minimum00
Maximum130130
Zeros4343
Zeros (%)0.2%0.2%
Negative00
Negative (%)0.0%0.0%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:17.949207image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 curated_md_reportconcatenated_md_report
Minimum00
5-th percentile0.0301369860.030136986
Q12424
median5757
Q36565
95-th percentile7575
Maximum130130
Range130130
Interquartile range (IQR)4141

Descriptive statistics

 curated_md_reportconcatenated_md_report
Standard deviation29.43489329.396631
Coefficient of variation (CV)0.630559690.62914849
Kurtosis0.37732180.38222039
Mean46.68058146.724472
Median Absolute Deviation (MAD)1111
Skewness0.169655040.16608084
Sum1021371.11026116.1
Variance866.41293864.16191
MonotonicityNot monotonicNot monotonic
2025-03-30T23:31:18.075726image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
65 7595
34.7%
130 743
 
3.4%
0.01917808219 539
 
2.5%
1 418
 
1.9%
0.05753424658 335
 
1.5%
0.01095890411 316
 
1.4%
51 280
 
1.3%
23 243
 
1.1%
2 239
 
1.1%
26 232
 
1.1%
Other values (455) 10940
50.0%
ValueCountFrequency (%)
65 7599
33.6%
130 743
 
3.3%
0.01917808219 539
 
2.4%
1 418
 
1.9%
0.05753424658 335
 
1.5%
0.01095890411 316
 
1.4%
51 287
 
1.3%
23 243
 
1.1%
2 239
 
1.1%
26 232
 
1.0%
Other values (456) 11010
48.7%
(Missing) 627
 
2.8%
ValueCountFrequency (%)
0 43
 
0.2%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.2%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.5%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%
ValueCountFrequency (%)
0 43
 
0.2%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.1%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.4%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%
ValueCountFrequency (%)
0 43
 
0.2%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.2%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.5%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%
ValueCountFrequency (%)
0 43
 
0.2%
0.002739726027 21
 
0.1%
0.005479452055 24
 
0.1%
0.008219178082 33
 
0.1%
0.01095890411 316
1.4%
0.01369863014 29
 
0.1%
0.01643835616 12
 
0.1%
0.01917808219 539
2.4%
0.02191780822 26
 
0.1%
0.02465753425 22
 
0.1%

age_group
Categorical

 curated_md_reportconcatenated_md_report
Distinct55
Distinct (%)< 0.1%< 0.1%
Missing11
Missing (%)< 0.1%< 0.1%
Memory size171.1 KiB176.6 KiB
Adult
14924 
Infant
3272 
Elderly
2357 
Adolescent
 
760
Children 2-11 Years Old
 
567
Adult
15454 
Infant
3427 
Elderly
2379 
Adolescent
 
760
Children 2-11 Years Old
 
567

Length

 curated_md_reportconcatenated_md_report
Max length2323
Median length55
Mean length6.00511885.9824678
Min length55

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters131392135126
Distinct characters2525
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique00 ?
Unique (%)0.0%0.0%

Sample

 curated_md_reportconcatenated_md_report
1st rowInfantInfant
2nd rowInfantInfant
3rd rowInfantInfant
4th rowInfantInfant
5th rowInfantInfant

Common Values

ValueCountFrequency (%)
Adult 14924
68.2%
Infant 3272
 
15.0%
Elderly 2357
 
10.8%
Adolescent 760
 
3.5%
Children 2-11 Years Old 567
 
2.6%
(Missing) 1
 
< 0.1%
ValueCountFrequency (%)
Adult 15454
68.4%
Infant 3427
 
15.2%
Elderly 2379
 
10.5%
Adolescent 760
 
3.4%
Children 2-11 Years Old 567
 
2.5%
(Missing) 1
 
< 0.1%

Length

2025-03-30T23:31:18.180531image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report

2025-03-30T23:31:18.245395image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:18.309065image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
adult 14924
63.3%
infant 3272
 
13.9%
elderly 2357
 
10.0%
adolescent 760
 
3.2%
children 567
 
2.4%
2-11 567
 
2.4%
years 567
 
2.4%
old 567
 
2.4%
ValueCountFrequency (%)
adult 15454
63.6%
infant 3427
 
14.1%
elderly 2379
 
9.8%
adolescent 760
 
3.1%
children 567
 
2.3%
2-11 567
 
2.3%
years 567
 
2.3%
old 567
 
2.3%

Most occurring characters

ValueCountFrequency (%)
l 21532
16.4%
d 19175
14.6%
t 18956
14.4%
A 15684
11.9%
u 14924
11.4%
n 7871
 
6.0%
e 5011
 
3.8%
a 3839
 
2.9%
r 3491
 
2.7%
I 3272
 
2.5%
Other values (15) 17637
13.4%
ValueCountFrequency (%)
l 22106
16.4%
d 19727
14.6%
t 19641
14.5%
A 16214
12.0%
u 15454
11.4%
n 8181
 
6.1%
e 5033
 
3.7%
a 3994
 
3.0%
r 3513
 
2.6%
I 3427
 
2.5%
Other values (15) 17836
13.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 131392
100.0%
ValueCountFrequency (%)
(unknown) 135126
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 21532
16.4%
d 19175
14.6%
t 18956
14.4%
A 15684
11.9%
u 14924
11.4%
n 7871
 
6.0%
e 5011
 
3.8%
a 3839
 
2.9%
r 3491
 
2.7%
I 3272
 
2.5%
Other values (15) 17637
13.4%
ValueCountFrequency (%)
l 22106
16.4%
d 19727
14.6%
t 19641
14.5%
A 16214
12.0%
u 15454
11.4%
n 8181
 
6.1%
e 5033
 
3.7%
a 3994
 
3.0%
r 3513
 
2.6%
I 3427
 
2.5%
Other values (15) 17836
13.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 131392
100.0%
ValueCountFrequency (%)
(unknown) 135126
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 21532
16.4%
d 19175
14.6%
t 18956
14.4%
A 15684
11.9%
u 14924
11.4%
n 7871
 
6.0%
e 5011
 
3.8%
a 3839
 
2.9%
r 3491
 
2.7%
I 3272
 
2.5%
Other values (15) 17637
13.4%
ValueCountFrequency (%)
l 22106
16.4%
d 19727
14.6%
t 19641
14.5%
A 16214
12.0%
u 15454
11.4%
n 8181
 
6.1%
e 5033
 
3.7%
a 3994
 
3.0%
r 3513
 
2.6%
I 3427
 
2.5%
Other values (15) 17836
13.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 131392
100.0%
ValueCountFrequency (%)
(unknown) 135126
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 21532
16.4%
d 19175
14.6%
t 18956
14.4%
A 15684
11.9%
u 14924
11.4%
n 7871
 
6.0%
e 5011
 
3.8%
a 3839
 
2.9%
r 3491
 
2.7%
I 3272
 
2.5%
Other values (15) 17637
13.4%
ValueCountFrequency (%)
l 22106
16.4%
d 19727
14.6%
t 19641
14.5%
A 16214
12.0%
u 15454
11.4%
n 8181
 
6.1%
e 5033
 
3.7%
a 3994
 
3.0%
r 3513
 
2.6%
I 3427
 
2.5%
Other values (15) 17836
13.2%
Distinct5
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size171.1 KiB
NCIT:C49685
14924 
NCIT:C27956
3272 
NCIT:C16268
2357 
NCIT:C27954
 
760
NCIT:C49683
 
567

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters240680
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNCIT:C27956
2nd rowNCIT:C27956
3rd rowNCIT:C27956
4th rowNCIT:C27956
5th rowNCIT:C27956

Common Values

ValueCountFrequency (%)
NCIT:C49685 14924
68.2%
NCIT:C27956 3272
 
15.0%
NCIT:C16268 2357
 
10.8%
NCIT:C27954 760
 
3.5%
NCIT:C49683 567
 
2.6%
(Missing) 1
 
< 0.1%

Length

2025-03-30T23:31:18.386983image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ncit:c49685 14924
68.2%
ncit:c27956 3272
 
15.0%
ncit:c16268 2357
 
10.8%
ncit:c27954 760
 
3.5%
ncit:c49683 567
 
2.6%

Most occurring characters

ValueCountFrequency (%)
C 43760
18.2%
6 23477
9.8%
N 21880
9.1%
I 21880
9.1%
T 21880
9.1%
: 21880
9.1%
9 19523
8.1%
5 18956
7.9%
8 17848
7.4%
4 16251
 
6.8%
Other values (4) 13345
 
5.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 240680
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 43760
18.2%
6 23477
9.8%
N 21880
9.1%
I 21880
9.1%
T 21880
9.1%
: 21880
9.1%
9 19523
8.1%
5 18956
7.9%
8 17848
7.4%
4 16251
 
6.8%
Other values (4) 13345
 
5.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 240680
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 43760
18.2%
6 23477
9.8%
N 21880
9.1%
I 21880
9.1%
T 21880
9.1%
: 21880
9.1%
9 19523
8.1%
5 18956
7.9%
8 17848
7.4%
4 16251
 
6.8%
Other values (4) 13345
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 240680
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 43760
18.2%
6 23477
9.8%
N 21880
9.1%
I 21880
9.1%
T 21880
9.1%
: 21880
9.1%
9 19523
8.1%
5 18956
7.9%
8 17848
7.4%
4 16251
 
6.8%
Other values (4) 13345
 
5.5%

biomarker
['Text', 'Text']

 curated_md_reportconcatenated_md_report
Distinct28052805
Distinct (%)92.3%92.3%
Missing1884119548
Missing (%)86.1%86.5%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:18.759298image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 curated_md_reportconcatenated_md_report
Max length628628
Median length616616
Mean length198.78684198.78684
Min length2525

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters604312604312
Distinct characters6262
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique27022702 ?
Unique (%)88.9%88.9%

Sample

 curated_md_reportconcatenated_md_report
1st rowAlanine_Aminotransferase_in_U/L:45;Albumin_in_g/dL:49;Aspartate_Aminotransferase_in_U/L:34;Creatine_in_umol/L:56;Erythrocyte_Sedimentation_Rate_in_mm/hr:4;Globulin_Protein_in_g/L:32;High_Sensitivity_C-Reactive_Protein_in_mg/L:5;Urea_Nitrogen_in_mmol/L:4.1Alanine_Aminotransferase_in_U/L:45;Albumin_in_g/dL:49;Aspartate_Aminotransferase_in_U/L:34;Creatine_in_umol/L:56;Erythrocyte_Sedimentation_Rate_in_mm/hr:4;Globulin_Protein_in_g/L:32;High_Sensitivity_C-Reactive_Protein_in_mg/L:5;Urea_Nitrogen_in_mmol/L:4.1
2nd rowAlanine_Aminotransferase_in_U/L:54;Albumin_in_g/dL:44.3;Aspartate_Aminotransferase_in_U/L:36;Creatine_in_umol/L:96;Erythrocyte_Sedimentation_Rate_in_mm/hr:3;Globulin_Protein_in_g/L:20.8;High_Sensitivity_C-Reactive_Protein_in_mg/L:8;Urea_Nitrogen_in_mmol/L:6.87Alanine_Aminotransferase_in_U/L:54;Albumin_in_g/dL:44.3;Aspartate_Aminotransferase_in_U/L:36;Creatine_in_umol/L:96;Erythrocyte_Sedimentation_Rate_in_mm/hr:3;Globulin_Protein_in_g/L:20.8;High_Sensitivity_C-Reactive_Protein_in_mg/L:8;Urea_Nitrogen_in_mmol/L:6.87
3rd rowAlanine_Aminotransferase_in_U/L:34;Albumin_in_g/dL:49;Aspartate_Aminotransferase_in_U/L:21;Creatine_in_umol/L:75;Erythrocyte_Sedimentation_Rate_in_mm/hr:24;Globulin_Protein_in_g/L:18.9;High_Sensitivity_C-Reactive_Protein_in_mg/L:8;Urea_Nitrogen_in_mmol/L:3.78Alanine_Aminotransferase_in_U/L:34;Albumin_in_g/dL:49;Aspartate_Aminotransferase_in_U/L:21;Creatine_in_umol/L:75;Erythrocyte_Sedimentation_Rate_in_mm/hr:24;Globulin_Protein_in_g/L:18.9;High_Sensitivity_C-Reactive_Protein_in_mg/L:8;Urea_Nitrogen_in_mmol/L:3.78
4th rowAlanine_Aminotransferase_in_U/L:22;Albumin_in_g/dL:40.1;Aspartate_Aminotransferase_in_U/L:29;Creatine_in_umol/L:64;Erythrocyte_Sedimentation_Rate_in_mm/hr:11;Globulin_Protein_in_g/L:31.9;High_Sensitivity_C-Reactive_Protein_in_mg/L:2.3;Urea_Nitrogen_in_mmol/L:4.01Alanine_Aminotransferase_in_U/L:22;Albumin_in_g/dL:40.1;Aspartate_Aminotransferase_in_U/L:29;Creatine_in_umol/L:64;Erythrocyte_Sedimentation_Rate_in_mm/hr:11;Globulin_Protein_in_g/L:31.9;High_Sensitivity_C-Reactive_Protein_in_mg/L:2.3;Urea_Nitrogen_in_mmol/L:4.01
5th rowAlanine_Aminotransferase_in_U/L:18;Albumin_in_g/dL:41.6;Aspartate_Aminotransferase_in_U/L:18;Creatine_in_umol/L:80.4;Erythrocyte_Sedimentation_Rate_in_mm/hr:18;Globulin_Protein_in_g/L:26.6;High_Sensitivity_C-Reactive_Protein_in_mg/L:3.9;Urea_Nitrogen_in_mmol/L:5.53Alanine_Aminotransferase_in_U/L:18;Albumin_in_g/dL:41.6;Aspartate_Aminotransferase_in_U/L:18;Creatine_in_umol/L:80.4;Erythrocyte_Sedimentation_Rate_in_mm/hr:18;Globulin_Protein_in_g/L:26.6;High_Sensitivity_C-Reactive_Protein_in_mg/L:3.9;Urea_Nitrogen_in_mmol/L:5.53
ValueCountFrequency (%)
diastolic_blood_pressure_in_mm/hg:80;systolic_blood_pressure_in_mm/hg:120 29
 
1.0%
autoantibody_titer_measurement_(procedure):iaa;gada;ia-2a;znt8a;ica 28
 
0.9%
diastolic_blood_pressure_in_mm/hg:70;systolic_blood_pressure_in_mm/hg:110 13
 
0.4%
autoantibody_titer_measurement_(procedure):iaa;gada 12
 
0.4%
autoantibody_titer_measurement_(procedure):iaa;gada;znt8a;ica 10
 
0.3%
autoantibody_titer_measurement_(procedure):iaa;ia-2a;znt8a;ica 9
 
0.3%
autoantibody_titer_measurement_(procedure):iaa;gada;ia-2a;ica 7
 
0.2%
cholesterol_in_mg/dl:211.1382;creatinine_in_umol/l:80.19;high_density_lipoprotein_cholesterol_in_mg/dl:51.0444;ldl_particles_in_mg/dl:128.3844;triglyceride_in_mg/dl:158.5403 7
 
0.2%
autoantibody_titer_measurement_(procedure):iaa;ica 7
 
0.2%
diastolic_blood_pressure_in_mm/hg:60;systolic_blood_pressure_in_mm/hg:100 5
 
0.2%
Other values (2795) 2913
95.8%
ValueCountFrequency (%)
diastolic_blood_pressure_in_mm/hg:80;systolic_blood_pressure_in_mm/hg:120 29
 
1.0%
autoantibody_titer_measurement_(procedure):iaa;gada;ia-2a;znt8a;ica 28
 
0.9%
diastolic_blood_pressure_in_mm/hg:70;systolic_blood_pressure_in_mm/hg:110 13
 
0.4%
autoantibody_titer_measurement_(procedure):iaa;gada 12
 
0.4%
autoantibody_titer_measurement_(procedure):iaa;gada;znt8a;ica 10
 
0.3%
autoantibody_titer_measurement_(procedure):iaa;ia-2a;znt8a;ica 9
 
0.3%
autoantibody_titer_measurement_(procedure):iaa;gada;ia-2a;ica 7
 
0.2%
cholesterol_in_mg/dl:211.1382;creatinine_in_umol/l:80.19;high_density_lipoprotein_cholesterol_in_mg/dl:51.0444;ldl_particles_in_mg/dl:128.3844;triglyceride_in_mg/dl:158.5403 7
 
0.2%
autoantibody_titer_measurement_(procedure):iaa;ica 7
 
0.2%
diastolic_blood_pressure_in_mm/hg:60;systolic_blood_pressure_in_mm/hg:100 5
 
0.2%
Other values (2795) 2913
95.8%
2025-03-30T23:31:19.276331image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 55291
 
9.1%
i 48146
 
8.0%
e 38229
 
6.3%
n 33837
 
5.6%
o 31170
 
5.2%
l 28005
 
4.6%
m 24744
 
4.1%
t 23738
 
3.9%
r 22611
 
3.7%
g 18271
 
3.0%
Other values (52) 280270
46.4%
ValueCountFrequency (%)
_ 55291
 
9.1%
i 48146
 
8.0%
e 38229
 
6.3%
n 33837
 
5.6%
o 31170
 
5.2%
l 28005
 
4.6%
m 24744
 
4.1%
t 23738
 
3.9%
r 22611
 
3.7%
g 18271
 
3.0%
Other values (52) 280270
46.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 604312
100.0%
ValueCountFrequency (%)
(unknown) 604312
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
_ 55291
 
9.1%
i 48146
 
8.0%
e 38229
 
6.3%
n 33837
 
5.6%
o 31170
 
5.2%
l 28005
 
4.6%
m 24744
 
4.1%
t 23738
 
3.9%
r 22611
 
3.7%
g 18271
 
3.0%
Other values (52) 280270
46.4%
ValueCountFrequency (%)
_ 55291
 
9.1%
i 48146
 
8.0%
e 38229
 
6.3%
n 33837
 
5.6%
o 31170
 
5.2%
l 28005
 
4.6%
m 24744
 
4.1%
t 23738
 
3.9%
r 22611
 
3.7%
g 18271
 
3.0%
Other values (52) 280270
46.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 604312
100.0%
ValueCountFrequency (%)
(unknown) 604312
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
_ 55291
 
9.1%
i 48146
 
8.0%
e 38229
 
6.3%
n 33837
 
5.6%
o 31170
 
5.2%
l 28005
 
4.6%
m 24744
 
4.1%
t 23738
 
3.9%
r 22611
 
3.7%
g 18271
 
3.0%
Other values (52) 280270
46.4%
ValueCountFrequency (%)
_ 55291
 
9.1%
i 48146
 
8.0%
e 38229
 
6.3%
n 33837
 
5.6%
o 31170
 
5.2%
l 28005
 
4.6%
m 24744
 
4.1%
t 23738
 
3.9%
r 22611
 
3.7%
g 18271
 
3.0%
Other values (52) 280270
46.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 604312
100.0%
ValueCountFrequency (%)
(unknown) 604312
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
_ 55291
 
9.1%
i 48146
 
8.0%
e 38229
 
6.3%
n 33837
 
5.6%
o 31170
 
5.2%
l 28005
 
4.6%
m 24744
 
4.1%
t 23738
 
3.9%
r 22611
 
3.7%
g 18271
 
3.0%
Other values (52) 280270
46.4%
ValueCountFrequency (%)
_ 55291
 
9.1%
i 48146
 
8.0%
e 38229
 
6.3%
n 33837
 
5.6%
o 31170
 
5.2%
l 28005
 
4.6%
m 24744
 
4.1%
t 23738
 
3.9%
r 22611
 
3.7%
g 18271
 
3.0%
Other values (52) 280270
46.4%

body_site
Categorical

 curated_md_reportconcatenated_md_report
Distinct2424
Distinct (%)0.1%0.1%
Missing00
Missing (%)0.0%0.0%
Memory size171.1 KiB176.6 KiB
feces
19400 
feces;rectum
 
923
skin epidermis
 
373
oral cavity
 
220
oral cavity;dorsum of tongue
 
198
Other values (19)
 
767
feces
20107 
feces;rectum
 
923
skin epidermis
 
373
oral cavity
 
220
oral cavity;dorsum of tongue
 
198
Other values (19)
 
767

Length

 curated_md_reportconcatenated_md_report
Max length3939
Median length55
Mean length6.59512826.545201
Min length44

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters144308147843
Distinct characters2626
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique11 ?
Unique (%)< 0.1%< 0.1%

Sample

 curated_md_reportconcatenated_md_report
1st rowfecesfeces
2nd rowfecesfeces
3rd rowfecesfeces
4th rowfecesfeces
5th rowfecesfeces

Common Values

ValueCountFrequency (%)
feces 19400
88.7%
feces;rectum 923
 
4.2%
skin epidermis 373
 
1.7%
oral cavity 220
 
1.0%
oral cavity;dorsum of tongue 198
 
0.9%
oral cavity;subgingival dental plaque 168
 
0.8%
oral cavity;supragingival dental plaque 127
 
0.6%
oral cavity;buccal mucosa 119
 
0.5%
nasal cavity;anterior naris 93
 
0.4%
vagina;posterior fornix of vagina 62
 
0.3%
Other values (14) 198
 
0.9%
ValueCountFrequency (%)
feces 20107
89.0%
feces;rectum 923
 
4.1%
skin epidermis 373
 
1.7%
oral cavity 220
 
1.0%
oral cavity;dorsum of tongue 198
 
0.9%
oral cavity;subgingival dental plaque 168
 
0.7%
oral cavity;supragingival dental plaque 127
 
0.6%
oral cavity;buccal mucosa 119
 
0.5%
nasal cavity;anterior naris 93
 
0.4%
vagina;posterior fornix of vagina 62
 
0.3%
Other values (14) 198
 
0.9%

Length

2025-03-30T23:31:19.382090image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

concatenated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
feces 19400
78.1%
feces;rectum 923
 
3.7%
oral 857
 
3.5%
skin 504
 
2.0%
epidermis 373
 
1.5%
plaque 295
 
1.2%
dental 295
 
1.2%
of 260
 
1.0%
cavity 220
 
0.9%
cavity;dorsum 198
 
0.8%
Other values (31) 1502
 
6.0%
ValueCountFrequency (%)
feces 20107
78.7%
feces;rectum 923
 
3.6%
oral 857
 
3.4%
skin 504
 
2.0%
epidermis 373
 
1.5%
plaque 295
 
1.2%
dental 295
 
1.2%
of 260
 
1.0%
cavity 220
 
0.9%
cavity;dorsum 198
 
0.8%
Other values (31) 1502
 
5.9%

Most occurring characters

ValueCountFrequency (%)
e 43731
30.3%
c 22619
15.7%
s 22217
15.4%
f 20689
14.3%
a 3939
 
2.7%
i 3707
 
2.6%
r 3313
 
2.3%
2946
 
2.0%
t 2638
 
1.8%
u 2201
 
1.5%
Other values (16) 16308
 
11.3%
ValueCountFrequency (%)
e 45145
30.5%
c 23326
15.8%
s 22924
15.5%
f 21396
14.5%
a 3939
 
2.7%
i 3707
 
2.5%
r 3313
 
2.2%
2946
 
2.0%
t 2638
 
1.8%
u 2201
 
1.5%
Other values (16) 16308
 
11.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 144308
100.0%
ValueCountFrequency (%)
(unknown) 147843
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 43731
30.3%
c 22619
15.7%
s 22217
15.4%
f 20689
14.3%
a 3939
 
2.7%
i 3707
 
2.6%
r 3313
 
2.3%
2946
 
2.0%
t 2638
 
1.8%
u 2201
 
1.5%
Other values (16) 16308
 
11.3%
ValueCountFrequency (%)
e 45145
30.5%
c 23326
15.8%
s 22924
15.5%
f 21396
14.5%
a 3939
 
2.7%
i 3707
 
2.5%
r 3313
 
2.2%
2946
 
2.0%
t 2638
 
1.8%
u 2201
 
1.5%
Other values (16) 16308
 
11.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 144308
100.0%
ValueCountFrequency (%)
(unknown) 147843
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 43731
30.3%
c 22619
15.7%
s 22217
15.4%
f 20689
14.3%
a 3939
 
2.7%
i 3707
 
2.6%
r 3313
 
2.3%
2946
 
2.0%
t 2638
 
1.8%
u 2201
 
1.5%
Other values (16) 16308
 
11.3%
ValueCountFrequency (%)
e 45145
30.5%
c 23326
15.8%
s 22924
15.5%
f 21396
14.5%
a 3939
 
2.7%
i 3707
 
2.5%
r 3313
 
2.2%
2946
 
2.0%
t 2638
 
1.8%
u 2201
 
1.5%
Other values (16) 16308
 
11.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 144308
100.0%
ValueCountFrequency (%)
(unknown) 147843
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 43731
30.3%
c 22619
15.7%
s 22217
15.4%
f 20689
14.3%
a 3939
 
2.7%
i 3707
 
2.6%
r 3313
 
2.3%
2946
 
2.0%
t 2638
 
1.8%
u 2201
 
1.5%
Other values (16) 16308
 
11.3%
ValueCountFrequency (%)
e 45145
30.5%
c 23326
15.8%
s 22924
15.5%
f 21396
14.5%
a 3939
 
2.7%
i 3707
 
2.5%
r 3313
 
2.2%
2946
 
2.0%
t 2638
 
1.8%
u 2201
 
1.5%
Other values (16) 16308
 
11.0%
Distinct24
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size171.1 KiB
UBERON:0001988
19400 
UBERON:0001988;UBERON:0001052
 
923
UBERON:0001003
 
373
UBERON:0000167
 
220
UBERON:0000167;UBERON:0009471
 
198
Other values (19)
 
767

Length

Max length29
Median length14
Mean length15.281934
Min length14

Characters and Unicode

Total characters334384
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowUBERON:0001988
2nd rowUBERON:0001988
3rd rowUBERON:0001988
4th rowUBERON:0001988
5th rowUBERON:0001988

Common Values

ValueCountFrequency (%)
UBERON:0001988 19400
88.7%
UBERON:0001988;UBERON:0001052 923
 
4.2%
UBERON:0001003 373
 
1.7%
UBERON:0000167 220
 
1.0%
UBERON:0000167;UBERON:0009471 198
 
0.9%
UBERON:0000167;UBERON:0016484 168
 
0.8%
UBERON:0000167;UBERON:0016485 127
 
0.6%
UBERON:0000167;UBERON:0006956 119
 
0.5%
UBERON:0001707;UBERON:2001427 93
 
0.4%
UBERON:0000996;UBERON:0016486 62
 
0.3%
Other values (14) 198
 
0.9%

Length

2025-03-30T23:31:19.473359image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
uberon:0001988 19400
88.7%
uberon:0001988;uberon:0001052 923
 
4.2%
uberon:0001003 373
 
1.7%
uberon:0000167 220
 
1.0%
uberon:0000167;uberon:0009471 198
 
0.9%
uberon:0000167;uberon:0016484 168
 
0.8%
uberon:0000167;uberon:0016485 127
 
0.6%
uberon:0000167;uberon:0006956 119
 
0.5%
uberon:0001707;uberon:2001427 93
 
0.4%
uberon:0000996;uberon:0016486 62
 
0.3%
Other values (14) 198
 
0.9%

Most occurring characters

ValueCountFrequency (%)
0 73688
22.0%
8 41074
12.3%
U 23751
 
7.1%
E 23751
 
7.1%
R 23751
 
7.1%
O 23751
 
7.1%
N 23751
 
7.1%
: 23751
 
7.1%
B 23751
 
7.1%
1 23552
 
7.0%
Other values (8) 29813
8.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 334384
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 73688
22.0%
8 41074
12.3%
U 23751
 
7.1%
E 23751
 
7.1%
R 23751
 
7.1%
O 23751
 
7.1%
N 23751
 
7.1%
: 23751
 
7.1%
B 23751
 
7.1%
1 23552
 
7.0%
Other values (8) 29813
8.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 334384
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 73688
22.0%
8 41074
12.3%
U 23751
 
7.1%
E 23751
 
7.1%
R 23751
 
7.1%
O 23751
 
7.1%
N 23751
 
7.1%
: 23751
 
7.1%
B 23751
 
7.1%
1 23552
 
7.0%
Other values (8) 29813
8.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 334384
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 73688
22.0%
8 41074
12.3%
U 23751
 
7.1%
E 23751
 
7.1%
R 23751
 
7.1%
O 23751
 
7.1%
N 23751
 
7.1%
: 23751
 
7.1%
B 23751
 
7.1%
1 23552
 
7.0%
Other values (8) 29813
8.9%

country
Categorical

 curated_md_reportconcatenated_md_report
Distinct4242
Distinct (%)0.2%0.2%
Missing00
Missing (%)0.0%0.0%
Memory size171.1 KiB176.6 KiB
United States
5350 
United Kingdom
3087 
Netherlands
1736 
China
1673 
Denmark
1301 
Other values (37)
8734 
United States
5404 
United Kingdom
3087 
Netherlands
2091 
China
1673 
Denmark
1301 
Other values (37)
9032 

Length

 curated_md_reportconcatenated_md_report
Max length2828
Median length1717
Mean length9.65527179.6121835
Min length44

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters211267217120
Distinct characters4646
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique33 ?
Unique (%)< 0.1%< 0.1%

Sample

 curated_md_reportconcatenated_md_report
1st rowItalyItaly
2nd rowItalyItaly
3rd rowItalyItaly
4th rowItalyItaly
5th rowItalyItaly

Common Values

ValueCountFrequency (%)
United States 5350
24.5%
United Kingdom 3087
14.1%
Netherlands 1736
 
7.9%
China 1673
 
7.6%
Denmark 1301
 
5.9%
Germany 988
 
4.5%
France 915
 
4.2%
Israel 900
 
4.1%
Italy 853
 
3.9%
Japan 696
 
3.2%
Other values (32) 4382
20.0%
ValueCountFrequency (%)
United States 5404
23.9%
United Kingdom 3087
13.7%
Netherlands 2091
 
9.3%
China 1673
 
7.4%
Denmark 1301
 
5.8%
Germany 988
 
4.4%
France 915
 
4.1%
Israel 900
 
4.0%
Italy 853
 
3.8%
Japan 696
 
3.1%
Other values (32) 4680
20.7%

Length

2025-03-30T23:31:19.548436image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

concatenated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
united 8572
27.5%
states 5350
17.1%
kingdom 3087
 
9.9%
netherlands 1736
 
5.6%
china 1673
 
5.4%
denmark 1301
 
4.2%
germany 988
 
3.2%
france 915
 
2.9%
israel 900
 
2.9%
italy 853
 
2.7%
Other values (38) 5841
18.7%
ValueCountFrequency (%)
united 8626
27.0%
states 5404
16.9%
kingdom 3087
 
9.7%
netherlands 2091
 
6.5%
china 1673
 
5.2%
denmark 1301
 
4.1%
germany 988
 
3.1%
france 915
 
2.9%
israel 900
 
2.8%
italy 853
 
2.7%
Other values (38) 6139
19.2%

Most occurring characters

ValueCountFrequency (%)
e 24265
11.5%
n 23170
11.0%
t 22763
10.8%
a 20438
 
9.7%
i 16568
 
7.8%
d 15623
 
7.4%
s 9386
 
4.4%
9335
 
4.4%
U 8572
 
4.1%
r 7195
 
3.4%
Other values (36) 53952
25.5%
ValueCountFrequency (%)
e 25083
11.6%
n 23606
10.9%
t 23280
10.7%
a 20928
 
9.6%
i 17164
 
7.9%
d 16059
 
7.4%
s 9795
 
4.5%
9389
 
4.3%
U 8626
 
4.0%
r 7550
 
3.5%
Other values (36) 55640
25.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 211267
100.0%
ValueCountFrequency (%)
(unknown) 217120
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 24265
11.5%
n 23170
11.0%
t 22763
10.8%
a 20438
 
9.7%
i 16568
 
7.8%
d 15623
 
7.4%
s 9386
 
4.4%
9335
 
4.4%
U 8572
 
4.1%
r 7195
 
3.4%
Other values (36) 53952
25.5%
ValueCountFrequency (%)
e 25083
11.6%
n 23606
10.9%
t 23280
10.7%
a 20928
 
9.6%
i 17164
 
7.9%
d 16059
 
7.4%
s 9795
 
4.5%
9389
 
4.3%
U 8626
 
4.0%
r 7550
 
3.5%
Other values (36) 55640
25.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 211267
100.0%
ValueCountFrequency (%)
(unknown) 217120
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 24265
11.5%
n 23170
11.0%
t 22763
10.8%
a 20438
 
9.7%
i 16568
 
7.8%
d 15623
 
7.4%
s 9386
 
4.4%
9335
 
4.4%
U 8572
 
4.1%
r 7195
 
3.4%
Other values (36) 53952
25.5%
ValueCountFrequency (%)
e 25083
11.6%
n 23606
10.9%
t 23280
10.7%
a 20928
 
9.6%
i 17164
 
7.9%
d 16059
 
7.4%
s 9795
 
4.5%
9389
 
4.3%
U 8626
 
4.0%
r 7550
 
3.5%
Other values (36) 55640
25.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 211267
100.0%
ValueCountFrequency (%)
(unknown) 217120
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 24265
11.5%
n 23170
11.0%
t 22763
10.8%
a 20438
 
9.7%
i 16568
 
7.8%
d 15623
 
7.4%
s 9386
 
4.4%
9335
 
4.4%
U 8572
 
4.1%
r 7195
 
3.4%
Other values (36) 53952
25.5%
ValueCountFrequency (%)
e 25083
11.6%
n 23606
10.9%
t 23280
10.7%
a 20928
 
9.6%
i 17164
 
7.9%
d 16059
 
7.4%
s 9795
 
4.5%
9389
 
4.3%
U 8626
 
4.0%
r 7550
 
3.5%
Other values (36) 55640
25.6%
Distinct42
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size171.1 KiB
NCIT:C17234
5350 
NCIT:C17233
3087 
NCIT:C16903
1736 
NCIT:C16428
1673 
NCIT:C16496
1301 
Other values (37)
8734 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters240691
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowNCIT:C16761
2nd rowNCIT:C16761
3rd rowNCIT:C16761
4th rowNCIT:C16761
5th rowNCIT:C16761

Common Values

ValueCountFrequency (%)
NCIT:C17234 5350
24.5%
NCIT:C17233 3087
14.1%
NCIT:C16903 1736
 
7.9%
NCIT:C16428 1673
 
7.6%
NCIT:C16496 1301
 
5.9%
NCIT:C16636 988
 
4.5%
NCIT:C16592 915
 
4.2%
NCIT:C16760 900
 
4.1%
NCIT:C16761 853
 
3.9%
NCIT:C16764 696
 
3.2%
Other values (32) 4382
20.0%

Length

2025-03-30T23:31:19.631059image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ncit:c17234 5350
24.5%
ncit:c17233 3087
14.1%
ncit:c16903 1736
 
7.9%
ncit:c16428 1673
 
7.6%
ncit:c16496 1301
 
5.9%
ncit:c16636 988
 
4.5%
ncit:c16592 915
 
4.2%
ncit:c16760 900
 
4.1%
ncit:c16761 853
 
3.9%
ncit:c16764 696
 
3.2%
Other values (32) 4382
20.0%

Most occurring characters

ValueCountFrequency (%)
C 43762
18.2%
1 24923
10.4%
N 21881
9.1%
I 21881
9.1%
T 21881
9.1%
: 21881
9.1%
6 17818
7.4%
3 15499
 
6.4%
7 13733
 
5.7%
2 12824
 
5.3%
Other values (5) 24608
10.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 240691
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 43762
18.2%
1 24923
10.4%
N 21881
9.1%
I 21881
9.1%
T 21881
9.1%
: 21881
9.1%
6 17818
7.4%
3 15499
 
6.4%
7 13733
 
5.7%
2 12824
 
5.3%
Other values (5) 24608
10.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 240691
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 43762
18.2%
1 24923
10.4%
N 21881
9.1%
I 21881
9.1%
T 21881
9.1%
: 21881
9.1%
6 17818
7.4%
3 15499
 
6.4%
7 13733
 
5.7%
2 12824
 
5.3%
Other values (5) 24608
10.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 240691
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 43762
18.2%
1 24923
10.4%
N 21881
9.1%
I 21881
9.1%
T 21881
9.1%
: 21881
9.1%
6 17818
7.4%
3 15499
 
6.4%
7 13733
 
5.7%
2 12824
 
5.3%
Other values (5) 24608
10.2%
 curated_md_reportconcatenated_md_report
Distinct33
Distinct (%)0.7%0.7%
Missing2146422171
Missing (%)98.1%98.2%
Memory size171.1 KiB176.6 KiB
omnivore
332 
vegetarian
49 
vegan
36 
omnivore
332 
vegetarian
49 
vegan
36 

Length

 curated_md_reportconcatenated_md_report
Max length1010
Median length88
Mean length7.97601927.9760192
Min length55

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters33263326
Distinct characters1010
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique00 ?
Unique (%)0.0%0.0%

Sample

 curated_md_reportconcatenated_md_report
1st rowomnivoreomnivore
2nd rowomnivoreomnivore
3rd rowomnivoreomnivore
4th rowomnivoreomnivore
5th rowomnivoreomnivore

Common Values

ValueCountFrequency (%)
omnivore 332
 
1.5%
vegetarian 49
 
0.2%
vegan 36
 
0.2%
(Missing) 21464
98.1%
ValueCountFrequency (%)
omnivore 332
 
1.5%
vegetarian 49
 
0.2%
vegan 36
 
0.2%
(Missing) 22171
98.2%

Length

2025-03-30T23:31:19.772220image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report

2025-03-30T23:31:19.818065image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:19.862783image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
omnivore 332
79.6%
vegetarian 49
 
11.8%
vegan 36
 
8.6%
ValueCountFrequency (%)
omnivore 332
79.6%
vegetarian 49
 
11.8%
vegan 36
 
8.6%

Most occurring characters

ValueCountFrequency (%)
o 664
20.0%
e 466
14.0%
n 417
12.5%
v 417
12.5%
i 381
11.5%
r 381
11.5%
m 332
10.0%
a 134
 
4.0%
g 85
 
2.6%
t 49
 
1.5%
ValueCountFrequency (%)
o 664
20.0%
e 466
14.0%
n 417
12.5%
v 417
12.5%
i 381
11.5%
r 381
11.5%
m 332
10.0%
a 134
 
4.0%
g 85
 
2.6%
t 49
 
1.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3326
100.0%
ValueCountFrequency (%)
(unknown) 3326
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 664
20.0%
e 466
14.0%
n 417
12.5%
v 417
12.5%
i 381
11.5%
r 381
11.5%
m 332
10.0%
a 134
 
4.0%
g 85
 
2.6%
t 49
 
1.5%
ValueCountFrequency (%)
o 664
20.0%
e 466
14.0%
n 417
12.5%
v 417
12.5%
i 381
11.5%
r 381
11.5%
m 332
10.0%
a 134
 
4.0%
g 85
 
2.6%
t 49
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3326
100.0%
ValueCountFrequency (%)
(unknown) 3326
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 664
20.0%
e 466
14.0%
n 417
12.5%
v 417
12.5%
i 381
11.5%
r 381
11.5%
m 332
10.0%
a 134
 
4.0%
g 85
 
2.6%
t 49
 
1.5%
ValueCountFrequency (%)
o 664
20.0%
e 466
14.0%
n 417
12.5%
v 417
12.5%
i 381
11.5%
r 381
11.5%
m 332
10.0%
a 134
 
4.0%
g 85
 
2.6%
t 49
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3326
100.0%
ValueCountFrequency (%)
(unknown) 3326
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 664
20.0%
e 466
14.0%
n 417
12.5%
v 417
12.5%
i 381
11.5%
r 381
11.5%
m 332
10.0%
a 134
 
4.0%
g 85
 
2.6%
t 49
 
1.5%
ValueCountFrequency (%)
o 664
20.0%
e 466
14.0%
n 417
12.5%
v 417
12.5%
i 381
11.5%
r 381
11.5%
m 332
10.0%
a 134
 
4.0%
g 85
 
2.6%
t 49
 
1.5%
 curated_md_reportconcatenated_md_report
Distinct33
Distinct (%)0.3%0.3%
Missing2078421491
Missing (%)95.0%95.1%
Memory size171.1 KiB176.6 KiB
Bristol stool form score (observable entity)
834 
Calprotectin Measurement
183 
Calprotectin Measurement;Harvey-Bradshaw Index Clinical Classification
 
80
Bristol stool form score (observable entity)
834 
Calprotectin Measurement
183 
Calprotectin Measurement;Harvey-Bradshaw Index Clinical Classification
 
80

Length

 curated_md_reportconcatenated_md_report
Max length7070
Median length4444
Mean length42.55970842.559708
Min length2424

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters4668846688
Distinct characters3131
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique00 ?
Unique (%)0.0%0.0%

Sample

 curated_md_reportconcatenated_md_report
1st rowCalprotectin MeasurementCalprotectin Measurement
2nd rowCalprotectin MeasurementCalprotectin Measurement
3rd rowCalprotectin MeasurementCalprotectin Measurement
4th rowCalprotectin MeasurementCalprotectin Measurement
5th rowCalprotectin MeasurementCalprotectin Measurement

Common Values

ValueCountFrequency (%)
Bristol stool form score (observable entity) 834
 
3.8%
Calprotectin Measurement 183
 
0.8%
Calprotectin Measurement;Harvey-Bradshaw Index Clinical Classification 80
 
0.4%
(Missing) 20784
95.0%
ValueCountFrequency (%)
Bristol stool form score (observable entity) 834
 
3.7%
Calprotectin Measurement 183
 
0.8%
Calprotectin Measurement;Harvey-Bradshaw Index Clinical Classification 80
 
0.4%
(Missing) 21491
95.1%

Length

2025-03-30T23:31:19.921551image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report

2025-03-30T23:31:19.966939image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:20.023296image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
bristol 834
14.5%
stool 834
14.5%
form 834
14.5%
score 834
14.5%
observable 834
14.5%
entity 834
14.5%
calprotectin 263
 
4.6%
measurement 183
 
3.2%
measurement;harvey-bradshaw 80
 
1.4%
index 80
 
1.4%
Other values (2) 160
 
2.8%
ValueCountFrequency (%)
bristol 834
14.5%
stool 834
14.5%
form 834
14.5%
score 834
14.5%
observable 834
14.5%
entity 834
14.5%
calprotectin 263
 
4.6%
measurement 183
 
3.2%
measurement;harvey-bradshaw 80
 
1.4%
index 80
 
1.4%
Other values (2) 160
 
2.8%

Most occurring characters

ValueCountFrequency (%)
o 5347
11.5%
4673
10.0%
e 4548
9.7%
t 4205
 
9.0%
r 4022
 
8.6%
s 3839
 
8.2%
l 3005
 
6.4%
i 2331
 
5.0%
a 1840
 
3.9%
b 1668
 
3.6%
Other values (21) 11210
24.0%
ValueCountFrequency (%)
o 5347
11.5%
4673
10.0%
e 4548
9.7%
t 4205
 
9.0%
r 4022
 
8.6%
s 3839
 
8.2%
l 3005
 
6.4%
i 2331
 
5.0%
a 1840
 
3.9%
b 1668
 
3.6%
Other values (21) 11210
24.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 46688
100.0%
ValueCountFrequency (%)
(unknown) 46688
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 5347
11.5%
4673
10.0%
e 4548
9.7%
t 4205
 
9.0%
r 4022
 
8.6%
s 3839
 
8.2%
l 3005
 
6.4%
i 2331
 
5.0%
a 1840
 
3.9%
b 1668
 
3.6%
Other values (21) 11210
24.0%
ValueCountFrequency (%)
o 5347
11.5%
4673
10.0%
e 4548
9.7%
t 4205
 
9.0%
r 4022
 
8.6%
s 3839
 
8.2%
l 3005
 
6.4%
i 2331
 
5.0%
a 1840
 
3.9%
b 1668
 
3.6%
Other values (21) 11210
24.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 46688
100.0%
ValueCountFrequency (%)
(unknown) 46688
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 5347
11.5%
4673
10.0%
e 4548
9.7%
t 4205
 
9.0%
r 4022
 
8.6%
s 3839
 
8.2%
l 3005
 
6.4%
i 2331
 
5.0%
a 1840
 
3.9%
b 1668
 
3.6%
Other values (21) 11210
24.0%
ValueCountFrequency (%)
o 5347
11.5%
4673
10.0%
e 4548
9.7%
t 4205
 
9.0%
r 4022
 
8.6%
s 3839
 
8.2%
l 3005
 
6.4%
i 2331
 
5.0%
a 1840
 
3.9%
b 1668
 
3.6%
Other values (21) 11210
24.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 46688
100.0%
ValueCountFrequency (%)
(unknown) 46688
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 5347
11.5%
4673
10.0%
e 4548
9.7%
t 4205
 
9.0%
r 4022
 
8.6%
s 3839
 
8.2%
l 3005
 
6.4%
i 2331
 
5.0%
a 1840
 
3.9%
b 1668
 
3.6%
Other values (21) 11210
24.0%
ValueCountFrequency (%)
o 5347
11.5%
4673
10.0%
e 4548
9.7%
t 4205
 
9.0%
r 4022
 
8.6%
s 3839
 
8.2%
l 3005
 
6.4%
i 2331
 
5.0%
a 1840
 
3.9%
b 1668
 
3.6%
Other values (21) 11210
24.0%

feces_phenotype_value
['Text', 'Text']

 curated_md_reportconcatenated_md_report
Distinct257257
Distinct (%)23.4%23.4%
Missing2078421491
Missing (%)95.0%95.1%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:20.332854image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 curated_md_reportconcatenated_md_report
Max length1010
Median length11
Mean length2.24156792.2415679
Min length11

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters24592459
Distinct characters1212
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique245245 ?
Unique (%)22.3%22.3%

Sample

 curated_md_reportconcatenated_md_report
1st row234.1497234.1497
2nd row6.16136.1613
3rd row35.2635.26
4th row3.79093.7909
5th row5.02935.0293
ValueCountFrequency (%)
4 427
38.9%
3 200
18.2%
2 67
 
6.1%
6 67
 
6.1%
5 46
 
4.2%
1 21
 
1.9%
0 10
 
0.9%
7 6
 
0.5%
70 2
 
0.2%
188.628 2
 
0.2%
Other values (247) 249
22.7%
ValueCountFrequency (%)
4 427
38.9%
3 200
18.2%
2 67
 
6.1%
6 67
 
6.1%
5 46
 
4.2%
1 21
 
1.9%
0 10
 
0.9%
7 6
 
0.5%
70 2
 
0.2%
188.628 2
 
0.2%
Other values (247) 249
22.7%
2025-03-30T23:31:20.762157image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4 551
22.4%
3 339
13.8%
2 279
11.3%
1 227
9.2%
6 196
 
8.0%
5 156
 
6.3%
. 149
 
6.1%
8 134
 
5.4%
0 125
 
5.1%
7 122
 
5.0%
Other values (2) 181
 
7.4%
ValueCountFrequency (%)
4 551
22.4%
3 339
13.8%
2 279
11.3%
1 227
9.2%
6 196
 
8.0%
5 156
 
6.3%
. 149
 
6.1%
8 134
 
5.4%
0 125
 
5.1%
7 122
 
5.0%
Other values (2) 181
 
7.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2459
100.0%
ValueCountFrequency (%)
(unknown) 2459
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
4 551
22.4%
3 339
13.8%
2 279
11.3%
1 227
9.2%
6 196
 
8.0%
5 156
 
6.3%
. 149
 
6.1%
8 134
 
5.4%
0 125
 
5.1%
7 122
 
5.0%
Other values (2) 181
 
7.4%
ValueCountFrequency (%)
4 551
22.4%
3 339
13.8%
2 279
11.3%
1 227
9.2%
6 196
 
8.0%
5 156
 
6.3%
. 149
 
6.1%
8 134
 
5.4%
0 125
 
5.1%
7 122
 
5.0%
Other values (2) 181
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2459
100.0%
ValueCountFrequency (%)
(unknown) 2459
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
4 551
22.4%
3 339
13.8%
2 279
11.3%
1 227
9.2%
6 196
 
8.0%
5 156
 
6.3%
. 149
 
6.1%
8 134
 
5.4%
0 125
 
5.1%
7 122
 
5.0%
Other values (2) 181
 
7.4%
ValueCountFrequency (%)
4 551
22.4%
3 339
13.8%
2 279
11.3%
1 227
9.2%
6 196
 
8.0%
5 156
 
6.3%
. 149
 
6.1%
8 134
 
5.4%
0 125
 
5.1%
7 122
 
5.0%
Other values (2) 181
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2459
100.0%
ValueCountFrequency (%)
(unknown) 2459
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
4 551
22.4%
3 339
13.8%
2 279
11.3%
1 227
9.2%
6 196
 
8.0%
5 156
 
6.3%
. 149
 
6.1%
8 134
 
5.4%
0 125
 
5.1%
7 122
 
5.0%
Other values (2) 181
 
7.4%
ValueCountFrequency (%)
4 551
22.4%
3 339
13.8%
2 279
11.3%
1 227
9.2%
6 196
 
8.0%
5 156
 
6.3%
. 149
 
6.1%
8 134
 
5.4%
0 125
 
5.1%
7 122
 
5.0%
Other values (2) 181
 
7.4%
Distinct3
Distinct (%)0.3%
Missing20784
Missing (%)95.0%
Memory size171.1 KiB
SNOMED:443172007
834 
NCIT:C82005
183 
NCIT:C82005;NCIT:C191036
 
80

Length

Max length24
Median length16
Mean length15.749316
Min length11

Characters and Unicode

Total characters17277
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNCIT:C82005
2nd rowNCIT:C82005
3rd rowNCIT:C82005
4th rowNCIT:C82005
5th rowNCIT:C82005

Common Values

ValueCountFrequency (%)
SNOMED:443172007 834
 
3.8%
NCIT:C82005 183
 
0.8%
NCIT:C82005;NCIT:C191036 80
 
0.4%
(Missing) 20784
95.0%

Length

2025-03-30T23:31:21.013072image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
snomed:443172007 834
76.0%
ncit:c82005 183
 
16.7%
ncit:c82005;ncit:c191036 80
 
7.3%

Most occurring characters

ValueCountFrequency (%)
0 2274
13.2%
7 1668
 
9.7%
4 1668
 
9.7%
N 1177
 
6.8%
: 1177
 
6.8%
2 1097
 
6.3%
1 994
 
5.8%
3 914
 
5.3%
S 834
 
4.8%
D 834
 
4.8%
Other values (11) 4640
26.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 17277
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2274
13.2%
7 1668
 
9.7%
4 1668
 
9.7%
N 1177
 
6.8%
: 1177
 
6.8%
2 1097
 
6.3%
1 994
 
5.8%
3 914
 
5.3%
S 834
 
4.8%
D 834
 
4.8%
Other values (11) 4640
26.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 17277
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2274
13.2%
7 1668
 
9.7%
4 1668
 
9.7%
N 1177
 
6.8%
: 1177
 
6.8%
2 1097
 
6.3%
1 994
 
5.8%
3 914
 
5.3%
S 834
 
4.8%
D 834
 
4.8%
Other values (11) 4640
26.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 17277
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2274
13.2%
7 1668
 
9.7%
4 1668
 
9.7%
N 1177
 
6.8%
: 1177
 
6.8%
2 1097
 
6.3%
1 994
 
5.8%
3 914
 
5.3%
S 834
 
4.8%
D 834
 
4.8%
Other values (11) 4640
26.9%

fmt_role
Categorical

Distinct3
Distinct (%)1.9%
Missing21725
Missing (%)99.3%
Memory size171.1 KiB
Recipient (after procedure)
109 
Recipient (before procedure)
35 
Donor
12 

Length

Max length28
Median length27
Mean length25.532051
Min length5

Characters and Unicode

Total characters3983
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRecipient (before procedure)
2nd rowRecipient (before procedure)
3rd rowRecipient (before procedure)
4th rowRecipient (before procedure)
5th rowRecipient (before procedure)

Common Values

ValueCountFrequency (%)
Recipient (after procedure) 109
 
0.5%
Recipient (before procedure) 35
 
0.2%
Donor 12
 
0.1%
(Missing) 21725
99.3%

Length

2025-03-30T23:31:21.077111image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
recipient 144
32.4%
procedure 144
32.4%
after 109
24.5%
before 35
 
7.9%
donor 12
 
2.7%

Most occurring characters

ValueCountFrequency (%)
e 755
19.0%
r 444
11.1%
c 288
 
7.2%
i 288
 
7.2%
p 288
 
7.2%
288
 
7.2%
t 253
 
6.4%
o 203
 
5.1%
n 156
 
3.9%
R 144
 
3.6%
Other values (8) 876
22.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3983
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 755
19.0%
r 444
11.1%
c 288
 
7.2%
i 288
 
7.2%
p 288
 
7.2%
288
 
7.2%
t 253
 
6.4%
o 203
 
5.1%
n 156
 
3.9%
R 144
 
3.6%
Other values (8) 876
22.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3983
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 755
19.0%
r 444
11.1%
c 288
 
7.2%
i 288
 
7.2%
p 288
 
7.2%
288
 
7.2%
t 253
 
6.4%
o 203
 
5.1%
n 156
 
3.9%
R 144
 
3.6%
Other values (8) 876
22.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3983
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 755
19.0%
r 444
11.1%
c 288
 
7.2%
i 288
 
7.2%
p 288
 
7.2%
288
 
7.2%
t 253
 
6.4%
o 203
 
5.1%
n 156
 
3.9%
R 144
 
3.6%
Other values (8) 876
22.0%

fmt_id
Categorical

 curated_md_reportconcatenated_md_report
Distinct4545
Distinct (%)31.0%31.0%
Missing2173622443
Missing (%)99.3%99.4%
Memory size171.1 KiB176.6 KiB
IaniroG_2020_2022_287
 
5
IaniroG_2020_2022_281
 
5
IaniroG_2020_2022_288
 
5
IaniroG_2020_2022_286
 
5
IaniroG_2020_2022_277
 
5
Other values (40)
120 
IaniroG_2020_2022_287
 
5
IaniroG_2020_2022_281
 
5
IaniroG_2020_2022_288
 
5
IaniroG_2020_2022_286
 
5
IaniroG_2020_2022_277
 
5
Other values (40)
120 

Length

 curated_md_reportconcatenated_md_report
Max length285285
Median length2121
Mean length24.79310324.793103
Min length2121

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters35953595
Distinct characters1919
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique88 ?
Unique (%)5.5%5.5%

Sample

 curated_md_reportconcatenated_md_report
1st rowIaniroG_2020_2022_260IaniroG_2020_2022_260
2nd rowIaniroG_2020_2022_264IaniroG_2020_2022_264
3rd rowIaniroG_2020_2022_267IaniroG_2020_2022_267
4th rowIaniroG_2020_2022_268IaniroG_2020_2022_268
5th rowIaniroG_2020_2022_262IaniroG_2020_2022_262

Common Values

ValueCountFrequency (%)
IaniroG_2020_2022_287 5
 
< 0.1%
IaniroG_2020_2022_281 5
 
< 0.1%
IaniroG_2020_2022_288 5
 
< 0.1%
IaniroG_2020_2022_286 5
 
< 0.1%
IaniroG_2020_2022_277 5
 
< 0.1%
IaniroG_2020_2022_273 5
 
< 0.1%
IaniroG_2020_2022_284 5
 
< 0.1%
IaniroG_2020_2022_278 5
 
< 0.1%
IaniroG_2020_2022_274 5
 
< 0.1%
IaniroG_2020_2022_285 5
 
< 0.1%
Other values (35) 95
 
0.4%
(Missing) 21736
99.3%
ValueCountFrequency (%)
IaniroG_2020_2022_287 5
 
< 0.1%
IaniroG_2020_2022_281 5
 
< 0.1%
IaniroG_2020_2022_288 5
 
< 0.1%
IaniroG_2020_2022_286 5
 
< 0.1%
IaniroG_2020_2022_277 5
 
< 0.1%
IaniroG_2020_2022_273 5
 
< 0.1%
IaniroG_2020_2022_284 5
 
< 0.1%
IaniroG_2020_2022_278 5
 
< 0.1%
IaniroG_2020_2022_274 5
 
< 0.1%
IaniroG_2020_2022_285 5
 
< 0.1%
Other values (35) 95
 
0.4%
(Missing) 22443
99.4%

Length

2025-03-30T23:31:21.165698image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

concatenated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
ianirog_2020_2022_287 5
 
3.4%
ianirog_2020_2022_288 5
 
3.4%
ianirog_2020_2022_286 5
 
3.4%
ianirog_2020_2022_277 5
 
3.4%
ianirog_2020_2022_273 5
 
3.4%
ianirog_2020_2022_284 5
 
3.4%
ianirog_2020_2022_278 5
 
3.4%
ianirog_2020_2022_274 5
 
3.4%
ianirog_2020_2022_285 5
 
3.4%
ianirog_2020_2022_271 5
 
3.4%
Other values (35) 95
65.5%
ValueCountFrequency (%)
ianirog_2020_2022_287 5
 
3.4%
ianirog_2020_2022_288 5
 
3.4%
ianirog_2020_2022_286 5
 
3.4%
ianirog_2020_2022_277 5
 
3.4%
ianirog_2020_2022_273 5
 
3.4%
ianirog_2020_2022_284 5
 
3.4%
ianirog_2020_2022_278 5
 
3.4%
ianirog_2020_2022_274 5
 
3.4%
ianirog_2020_2022_285 5
 
3.4%
ianirog_2020_2022_271 5
 
3.4%
Other values (35) 95
65.5%

Most occurring characters

ValueCountFrequency (%)
2 1037
28.8%
0 526
14.6%
_ 510
14.2%
G 170
 
4.7%
a 170
 
4.7%
I 170
 
4.7%
o 170
 
4.7%
r 170
 
4.7%
i 170
 
4.7%
n 170
 
4.7%
Other values (9) 332
 
9.2%
ValueCountFrequency (%)
2 1037
28.8%
0 526
14.6%
_ 510
14.2%
G 170
 
4.7%
a 170
 
4.7%
I 170
 
4.7%
o 170
 
4.7%
r 170
 
4.7%
i 170
 
4.7%
n 170
 
4.7%
Other values (9) 332
 
9.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3595
100.0%
ValueCountFrequency (%)
(unknown) 3595
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 1037
28.8%
0 526
14.6%
_ 510
14.2%
G 170
 
4.7%
a 170
 
4.7%
I 170
 
4.7%
o 170
 
4.7%
r 170
 
4.7%
i 170
 
4.7%
n 170
 
4.7%
Other values (9) 332
 
9.2%
ValueCountFrequency (%)
2 1037
28.8%
0 526
14.6%
_ 510
14.2%
G 170
 
4.7%
a 170
 
4.7%
I 170
 
4.7%
o 170
 
4.7%
r 170
 
4.7%
i 170
 
4.7%
n 170
 
4.7%
Other values (9) 332
 
9.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3595
100.0%
ValueCountFrequency (%)
(unknown) 3595
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 1037
28.8%
0 526
14.6%
_ 510
14.2%
G 170
 
4.7%
a 170
 
4.7%
I 170
 
4.7%
o 170
 
4.7%
r 170
 
4.7%
i 170
 
4.7%
n 170
 
4.7%
Other values (9) 332
 
9.2%
ValueCountFrequency (%)
2 1037
28.8%
0 526
14.6%
_ 510
14.2%
G 170
 
4.7%
a 170
 
4.7%
I 170
 
4.7%
o 170
 
4.7%
r 170
 
4.7%
i 170
 
4.7%
n 170
 
4.7%
Other values (9) 332
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3595
100.0%
ValueCountFrequency (%)
(unknown) 3595
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 1037
28.8%
0 526
14.6%
_ 510
14.2%
G 170
 
4.7%
a 170
 
4.7%
I 170
 
4.7%
o 170
 
4.7%
r 170
 
4.7%
i 170
 
4.7%
n 170
 
4.7%
Other values (9) 332
 
9.2%
ValueCountFrequency (%)
2 1037
28.8%
0 526
14.6%
_ 510
14.2%
G 170
 
4.7%
a 170
 
4.7%
I 170
 
4.7%
o 170
 
4.7%
r 170
 
4.7%
i 170
 
4.7%
n 170
 
4.7%
Other values (9) 332
 
9.2%

sex
Categorical

 curated_md_reportconcatenated_md_report
Distinct22
Distinct (%)< 0.1%< 0.1%
Missing25582558
Missing (%)11.7%11.3%
Memory size171.1 KiB176.6 KiB
Female
9693 
Male
9630 
Female
10157 
Male
9873 

Length

 curated_md_reportconcatenated_md_report
Max length66
Median length66
Mean length5.00326045.0141787
Min length44

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters96678100434
Distinct characters66
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique00 ?
Unique (%)0.0%0.0%

Sample

 curated_md_reportconcatenated_md_report
1st rowFemaleFemale
2nd rowMaleMale
3rd rowMaleMale
4th rowMaleMale
5th rowMaleMale

Common Values

ValueCountFrequency (%)
Female 9693
44.3%
Male 9630
44.0%
(Missing) 2558
 
11.7%
ValueCountFrequency (%)
Female 10157
45.0%
Male 9873
43.7%
(Missing) 2558
 
11.3%

Length

2025-03-30T23:31:21.247389image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report

2025-03-30T23:31:21.309888image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:21.340670image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
female 9693
50.2%
male 9630
49.8%
ValueCountFrequency (%)
female 10157
50.7%
male 9873
49.3%

Most occurring characters

ValueCountFrequency (%)
e 29016
30.0%
a 19323
20.0%
l 19323
20.0%
F 9693
 
10.0%
m 9693
 
10.0%
M 9630
 
10.0%
ValueCountFrequency (%)
e 30187
30.1%
a 20030
19.9%
l 20030
19.9%
F 10157
 
10.1%
m 10157
 
10.1%
M 9873
 
9.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 96678
100.0%
ValueCountFrequency (%)
(unknown) 100434
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 29016
30.0%
a 19323
20.0%
l 19323
20.0%
F 9693
 
10.0%
m 9693
 
10.0%
M 9630
 
10.0%
ValueCountFrequency (%)
e 30187
30.1%
a 20030
19.9%
l 20030
19.9%
F 10157
 
10.1%
m 10157
 
10.1%
M 9873
 
9.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 96678
100.0%
ValueCountFrequency (%)
(unknown) 100434
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 29016
30.0%
a 19323
20.0%
l 19323
20.0%
F 9693
 
10.0%
m 9693
 
10.0%
M 9630
 
10.0%
ValueCountFrequency (%)
e 30187
30.1%
a 20030
19.9%
l 20030
19.9%
F 10157
 
10.1%
m 10157
 
10.1%
M 9873
 
9.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 96678
100.0%
ValueCountFrequency (%)
(unknown) 100434
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 29016
30.0%
a 19323
20.0%
l 19323
20.0%
F 9693
 
10.0%
m 9693
 
10.0%
M 9630
 
10.0%
ValueCountFrequency (%)
e 30187
30.1%
a 20030
19.9%
l 20030
19.9%
F 10157
 
10.1%
m 10157
 
10.1%
M 9873
 
9.8%
Distinct2
Distinct (%)< 0.1%
Missing2558
Missing (%)11.7%
Memory size171.1 KiB
NCIT:C16576
9693 
NCIT:C20197
9630 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters212553
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNCIT:C16576
2nd rowNCIT:C20197
3rd rowNCIT:C20197
4th rowNCIT:C20197
5th rowNCIT:C20197

Common Values

ValueCountFrequency (%)
NCIT:C16576 9693
44.3%
NCIT:C20197 9630
44.0%
(Missing) 2558
 
11.7%

Length

2025-03-30T23:31:21.398428image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ncit:c16576 9693
50.2%
ncit:c20197 9630
49.8%

Most occurring characters

ValueCountFrequency (%)
C 38646
18.2%
6 19386
9.1%
N 19323
9.1%
I 19323
9.1%
T 19323
9.1%
: 19323
9.1%
1 19323
9.1%
7 19323
9.1%
5 9693
 
4.6%
2 9630
 
4.5%
Other values (2) 19260
9.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 212553
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 38646
18.2%
6 19386
9.1%
N 19323
9.1%
I 19323
9.1%
T 19323
9.1%
: 19323
9.1%
1 19323
9.1%
7 19323
9.1%
5 9693
 
4.6%
2 9630
 
4.5%
Other values (2) 19260
9.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 212553
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 38646
18.2%
6 19386
9.1%
N 19323
9.1%
I 19323
9.1%
T 19323
9.1%
: 19323
9.1%
1 19323
9.1%
7 19323
9.1%
5 9693
 
4.6%
2 9630
 
4.5%
Other values (2) 19260
9.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 212553
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 38646
18.2%
6 19386
9.1%
N 19323
9.1%
I 19323
9.1%
T 19323
9.1%
: 19323
9.1%
1 19323
9.1%
7 19323
9.1%
5 9693
 
4.6%
2 9630
 
4.5%
Other values (2) 19260
9.1%

hla
Categorical

Distinct35
Distinct (%)3.9%
Missing20981
Missing (%)95.9%
Memory size171.1 KiB
HLA protein complex with DQ5 serotype
225 
HLA-DRB1*04:01 protein complex
102 
HLA-DQA1*02:01 protein complex;HLA protein complex with DQ5 serotype
98 
HLA-DRB1*04:04 protein complex
84 
HLA protein complex with DQ5 serotype;HLA protein complex with DQ5 serotype
49 
Other values (30)
342 

Length

Max length175
Median length144
Mean length64.896667
Min length30

Characters and Unicode

Total characters58407
Distinct characters35
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowHLA-DQB1*03:02 protein complex;HLA-DQB1*05:01 protein complex;HLA-DRB1*04:04 protein complex
2nd rowHLA-DQB1*03:02 protein complex;HLA-DQB1*05:01 protein complex;HLA-DRB1*04:04 protein complex
3rd rowHLA-DQB1*03:02 protein complex;HLA-DQB1*05:01 protein complex;HLA-DRB1*04:04 protein complex
4th rowHLA-DQB1*03:02 protein complex;HLA-DQB1*05:01 protein complex;HLA-DRB1*04:04 protein complex
5th rowHLA-DQB1*03:02 protein complex;HLA-DQB1*05:01 protein complex;HLA-DRB1*04:04 protein complex

Common Values

ValueCountFrequency (%)
HLA protein complex with DQ5 serotype 225
 
1.0%
HLA-DRB1*04:01 protein complex 102
 
0.5%
HLA-DQA1*02:01 protein complex;HLA protein complex with DQ5 serotype 98
 
0.4%
HLA-DRB1*04:04 protein complex 84
 
0.4%
HLA protein complex with DQ5 serotype;HLA protein complex with DQ5 serotype 49
 
0.2%
HLA-DRB1*04:01 protein complex;HLA protein complex with DQ3 serotype;HLA protein complex with DQ5 serotype 41
 
0.2%
HLA-DQB1*03:02 protein complex;HLA protein complex with DQ4 serotype;HLA-DRB1*04:01 protein complex 34
 
0.2%
HLA-DRB1*04:04 protein complex;HLA protein complex with DQ3 serotype;HLA-DQA1*02:01 protein complex 32
 
0.1%
HLA-DRB1*04:04 protein complex;HLA protein complex with DQ3 serotype;HLA protein complex with DQ5 serotype 31
 
0.1%
HLA protein complex with DQ3 serotype;HLA protein complex with DQ5 serotype 28
 
0.1%
Other values (25) 176
 
0.8%
(Missing) 20981
95.9%

Length

2025-03-30T23:31:21.486765image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
protein 1702
23.9%
complex 1319
18.6%
with 935
13.2%
dq5 574
 
8.1%
serotype 516
 
7.3%
hla 371
 
5.2%
complex;hla 292
 
4.1%
serotype;hla 272
 
3.8%
dq3 256
 
3.6%
hla-drb1*04:04 168
 
2.4%
Other values (19) 704
9.9%

Most occurring characters

ValueCountFrequency (%)
6209
 
10.6%
e 5274
 
9.0%
p 4339
 
7.4%
o 4339
 
7.4%
t 3572
 
6.1%
r 2637
 
4.5%
i 2637
 
4.5%
A 1856
 
3.2%
H 1702
 
2.9%
l 1702
 
2.9%
Other values (25) 24140
41.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 58407
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
6209
 
10.6%
e 5274
 
9.0%
p 4339
 
7.4%
o 4339
 
7.4%
t 3572
 
6.1%
r 2637
 
4.5%
i 2637
 
4.5%
A 1856
 
3.2%
H 1702
 
2.9%
l 1702
 
2.9%
Other values (25) 24140
41.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 58407
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
6209
 
10.6%
e 5274
 
9.0%
p 4339
 
7.4%
o 4339
 
7.4%
t 3572
 
6.1%
r 2637
 
4.5%
i 2637
 
4.5%
A 1856
 
3.2%
H 1702
 
2.9%
l 1702
 
2.9%
Other values (25) 24140
41.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 58407
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
6209
 
10.6%
e 5274
 
9.0%
p 4339
 
7.4%
o 4339
 
7.4%
t 3572
 
6.1%
r 2637
 
4.5%
i 2637
 
4.5%
A 1856
 
3.2%
H 1702
 
2.9%
l 1702
 
2.9%
Other values (25) 24140
41.3%
Distinct35
Distinct (%)3.9%
Missing20981
Missing (%)95.9%
Memory size171.1 KiB
MRO:0001626
225 
MRO:0001290
102 
MRO:0001211;MRO:0001626
98 
MRO:0001293
84 
MRO:0001626;MRO:0001626
49 
Other values (30)
342 

Length

Max length59
Median length47
Mean length21.693333
Min length11

Characters and Unicode

Total characters19524
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowMRO:0001240;MRO:0001247;MRO:0001293
2nd rowMRO:0001240;MRO:0001247;MRO:0001293
3rd rowMRO:0001240;MRO:0001247;MRO:0001293
4th rowMRO:0001240;MRO:0001247;MRO:0001293
5th rowMRO:0001240;MRO:0001247;MRO:0001293

Common Values

ValueCountFrequency (%)
MRO:0001626 225
 
1.0%
MRO:0001290 102
 
0.5%
MRO:0001211;MRO:0001626 98
 
0.4%
MRO:0001293 84
 
0.4%
MRO:0001626;MRO:0001626 49
 
0.2%
MRO:0001290;MRO:0001622;MRO:0001626 41
 
0.2%
MRO:0001240;MRO:0001625;MRO:0001290 34
 
0.2%
MRO:0001293;MRO:0001622;MRO:0001211 32
 
0.1%
MRO:0001293;MRO:0001622;MRO:0001626 31
 
0.1%
MRO:0001622;MRO:0001626 28
 
0.1%
Other values (25) 176
 
0.8%
(Missing) 20981
95.9%

Length

2025-03-30T23:31:21.567031image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mro:0001626 225
25.0%
mro:0001290 102
11.3%
mro:0001211;mro:0001626 98
10.9%
mro:0001293 84
 
9.3%
mro:0001626;mro:0001626 49
 
5.4%
mro:0001290;mro:0001622;mro:0001626 41
 
4.6%
mro:0001240;mro:0001625;mro:0001290 34
 
3.8%
mro:0001293;mro:0001622;mro:0001211 32
 
3.6%
mro:0001293;mro:0001622;mro:0001626 31
 
3.4%
mro:0001622;mro:0001626 28
 
3.1%
Other values (25) 176
19.6%

Most occurring characters

ValueCountFrequency (%)
0 5496
28.1%
1 2028
 
10.4%
2 1948
 
10.0%
M 1702
 
8.7%
R 1702
 
8.7%
O 1702
 
8.7%
: 1702
 
8.7%
6 1509
 
7.7%
; 802
 
4.1%
9 494
 
2.5%
Other values (4) 439
 
2.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 19524
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 5496
28.1%
1 2028
 
10.4%
2 1948
 
10.0%
M 1702
 
8.7%
R 1702
 
8.7%
O 1702
 
8.7%
: 1702
 
8.7%
6 1509
 
7.7%
; 802
 
4.1%
9 494
 
2.5%
Other values (4) 439
 
2.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 19524
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 5496
28.1%
1 2028
 
10.4%
2 1948
 
10.0%
M 1702
 
8.7%
R 1702
 
8.7%
O 1702
 
8.7%
: 1702
 
8.7%
6 1509
 
7.7%
; 802
 
4.1%
9 494
 
2.5%
Other values (4) 439
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 19524
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 5496
28.1%
1 2028
 
10.4%
2 1948
 
10.0%
M 1702
 
8.7%
R 1702
 
8.7%
O 1702
 
8.7%
: 1702
 
8.7%
6 1509
 
7.7%
; 802
 
4.1%
9 494
 
2.5%
Other values (4) 439
 
2.2%

smoker
Categorical

 curated_md_reportconcatenated_md_report
Distinct44
Distinct (%)0.1%0.1%
Missing1890119608
Missing (%)86.4%86.8%
Memory size171.1 KiB176.6 KiB
Non-smoker (finding)
1584 
Non-smoker (finding);Never smoked tobacco (finding)
799 
Smoker (finding)
437 
Non-smoker (finding);Ex-smoker (finding)
160 
Non-smoker (finding)
1584 
Non-smoker (finding);Never smoked tobacco (finding)
799 
Smoker (finding)
437 
Non-smoker (finding);Ex-smoker (finding)
160 

Length

 curated_md_reportconcatenated_md_report
Max length5151
Median length2020
Mean length28.79899328.798993
Min length1616

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters8582185821
Distinct characters2525
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique00 ?
Unique (%)0.0%0.0%

Sample

 curated_md_reportconcatenated_md_report
1st rowNon-smoker (finding);Never smoked tobacco (finding)Non-smoker (finding);Never smoked tobacco (finding)
2nd rowNon-smoker (finding);Never smoked tobacco (finding)Non-smoker (finding);Never smoked tobacco (finding)
3rd rowNon-smoker (finding);Never smoked tobacco (finding)Non-smoker (finding);Never smoked tobacco (finding)
4th rowNon-smoker (finding);Never smoked tobacco (finding)Non-smoker (finding);Never smoked tobacco (finding)
5th rowNon-smoker (finding);Never smoked tobacco (finding)Non-smoker (finding);Never smoked tobacco (finding)

Common Values

ValueCountFrequency (%)
Non-smoker (finding) 1584
 
7.2%
Non-smoker (finding);Never smoked tobacco (finding) 799
 
3.7%
Smoker (finding) 437
 
2.0%
Non-smoker (finding);Ex-smoker (finding) 160
 
0.7%
(Missing) 18901
86.4%
ValueCountFrequency (%)
Non-smoker (finding) 1584
 
7.0%
Non-smoker (finding);Never smoked tobacco (finding) 799
 
3.5%
Smoker (finding) 437
 
1.9%
Non-smoker (finding);Ex-smoker (finding) 160
 
0.7%
(Missing) 19608
86.8%

Length

2025-03-30T23:31:21.636276image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report

2025-03-30T23:31:21.695286image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:21.752488image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
finding 2980
35.0%
non-smoker 2543
29.9%
finding);never 799
 
9.4%
smoked 799
 
9.4%
tobacco 799
 
9.4%
smoker 437
 
5.1%
finding);ex-smoker 160
 
1.9%
ValueCountFrequency (%)
finding 2980
35.0%
non-smoker 2543
29.9%
finding);never 799
 
9.4%
smoked 799
 
9.4%
tobacco 799
 
9.4%
smoker 437
 
5.1%
finding);ex-smoker 160
 
1.9%

Most occurring characters

ValueCountFrequency (%)
n 10421
 
12.1%
o 8080
 
9.4%
i 7878
 
9.2%
e 5537
 
6.5%
5537
 
6.5%
d 4738
 
5.5%
g 3939
 
4.6%
) 3939
 
4.6%
m 3939
 
4.6%
k 3939
 
4.6%
Other values (15) 27874
32.5%
ValueCountFrequency (%)
n 10421
 
12.1%
o 8080
 
9.4%
i 7878
 
9.2%
e 5537
 
6.5%
5537
 
6.5%
d 4738
 
5.5%
g 3939
 
4.6%
) 3939
 
4.6%
m 3939
 
4.6%
k 3939
 
4.6%
Other values (15) 27874
32.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 85821
100.0%
ValueCountFrequency (%)
(unknown) 85821
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 10421
 
12.1%
o 8080
 
9.4%
i 7878
 
9.2%
e 5537
 
6.5%
5537
 
6.5%
d 4738
 
5.5%
g 3939
 
4.6%
) 3939
 
4.6%
m 3939
 
4.6%
k 3939
 
4.6%
Other values (15) 27874
32.5%
ValueCountFrequency (%)
n 10421
 
12.1%
o 8080
 
9.4%
i 7878
 
9.2%
e 5537
 
6.5%
5537
 
6.5%
d 4738
 
5.5%
g 3939
 
4.6%
) 3939
 
4.6%
m 3939
 
4.6%
k 3939
 
4.6%
Other values (15) 27874
32.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 85821
100.0%
ValueCountFrequency (%)
(unknown) 85821
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 10421
 
12.1%
o 8080
 
9.4%
i 7878
 
9.2%
e 5537
 
6.5%
5537
 
6.5%
d 4738
 
5.5%
g 3939
 
4.6%
) 3939
 
4.6%
m 3939
 
4.6%
k 3939
 
4.6%
Other values (15) 27874
32.5%
ValueCountFrequency (%)
n 10421
 
12.1%
o 8080
 
9.4%
i 7878
 
9.2%
e 5537
 
6.5%
5537
 
6.5%
d 4738
 
5.5%
g 3939
 
4.6%
) 3939
 
4.6%
m 3939
 
4.6%
k 3939
 
4.6%
Other values (15) 27874
32.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 85821
100.0%
ValueCountFrequency (%)
(unknown) 85821
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 10421
 
12.1%
o 8080
 
9.4%
i 7878
 
9.2%
e 5537
 
6.5%
5537
 
6.5%
d 4738
 
5.5%
g 3939
 
4.6%
) 3939
 
4.6%
m 3939
 
4.6%
k 3939
 
4.6%
Other values (15) 27874
32.5%
ValueCountFrequency (%)
n 10421
 
12.1%
o 8080
 
9.4%
i 7878
 
9.2%
e 5537
 
6.5%
5537
 
6.5%
d 4738
 
5.5%
g 3939
 
4.6%
) 3939
 
4.6%
m 3939
 
4.6%
k 3939
 
4.6%
Other values (15) 27874
32.5%
Distinct4
Distinct (%)0.1%
Missing18901
Missing (%)86.4%
Memory size171.1 KiB
SNOMED:8392000
1584 
SNOMED:8392000;SNOMED:266919005
799 
SNOMED:77176002
437 
SNOMED:8392000;SNOMED:8517006
160 

Length

Max length31
Median length14
Mean length19.510067
Min length14

Characters and Unicode

Total characters58140
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSNOMED:8392000;SNOMED:266919005
2nd rowSNOMED:8392000;SNOMED:266919005
3rd rowSNOMED:8392000;SNOMED:266919005
4th rowSNOMED:8392000;SNOMED:266919005
5th rowSNOMED:8392000;SNOMED:266919005

Common Values

ValueCountFrequency (%)
SNOMED:8392000 1584
 
7.2%
SNOMED:8392000;SNOMED:266919005 799
 
3.7%
SNOMED:77176002 437
 
2.0%
SNOMED:8392000;SNOMED:8517006 160
 
0.7%
(Missing) 18901
86.4%

Length

2025-03-30T23:31:21.891645image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
snomed:8392000 1584
53.2%
snomed:8392000;snomed:266919005 799
26.8%
snomed:77176002 437
 
14.7%
snomed:8392000;snomed:8517006 160
 
5.4%

Most occurring characters

ValueCountFrequency (%)
0 10421
17.9%
9 4141
 
7.1%
S 3939
 
6.8%
O 3939
 
6.8%
M 3939
 
6.8%
E 3939
 
6.8%
D 3939
 
6.8%
: 3939
 
6.8%
N 3939
 
6.8%
2 3779
 
6.5%
Other values (7) 12226
21.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 58140
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 10421
17.9%
9 4141
 
7.1%
S 3939
 
6.8%
O 3939
 
6.8%
M 3939
 
6.8%
E 3939
 
6.8%
D 3939
 
6.8%
: 3939
 
6.8%
N 3939
 
6.8%
2 3779
 
6.5%
Other values (7) 12226
21.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 58140
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 10421
17.9%
9 4141
 
7.1%
S 3939
 
6.8%
O 3939
 
6.8%
M 3939
 
6.8%
E 3939
 
6.8%
D 3939
 
6.8%
: 3939
 
6.8%
N 3939
 
6.8%
2 3779
 
6.5%
Other values (7) 12226
21.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 58140
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 10421
17.9%
9 4141
 
7.1%
S 3939
 
6.8%
O 3939
 
6.8%
M 3939
 
6.8%
E 3939
 
6.8%
D 3939
 
6.8%
: 3939
 
6.8%
N 3939
 
6.8%
2 3779
 
6.5%
Other values (7) 12226
21.0%

control
Categorical

 curated_md_reportconcatenated_md_report
Distinct33
Distinct (%)< 0.1%< 0.1%
Missing0707
Missing (%)0.0%3.1%
Memory size171.1 KiB176.6 KiB
Study Control
14822 
Case
7034 
Not Used
 
25
Study Control
14822 
Case
7034 
Not Used
 
25

Length

 curated_md_reportconcatenated_md_report
Max length1313
Median length1313
Mean length10.10109210.101092
Min length44

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters221022221022
Distinct characters1616
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique00 ?
Unique (%)0.0%0.0%

Sample

 curated_md_reportconcatenated_md_report
1st rowStudy ControlStudy Control
2nd rowStudy ControlStudy Control
3rd rowStudy ControlStudy Control
4th rowStudy ControlStudy Control
5th rowStudy ControlStudy Control

Common Values

ValueCountFrequency (%)
Study Control 14822
67.7%
Case 7034
32.1%
Not Used 25
 
0.1%
ValueCountFrequency (%)
Study Control 14822
65.6%
Case 7034
31.1%
Not Used 25
 
0.1%
(Missing) 707
 
3.1%

Length

2025-03-30T23:31:21.976524image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report

2025-03-30T23:31:22.029015image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:22.074815image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
study 14822
40.4%
control 14822
40.4%
case 7034
19.2%
not 25
 
0.1%
used 25
 
0.1%
ValueCountFrequency (%)
study 14822
40.4%
control 14822
40.4%
case 7034
19.2%
not 25
 
0.1%
used 25
 
0.1%

Most occurring characters

ValueCountFrequency (%)
t 29669
13.4%
o 29669
13.4%
C 21856
9.9%
d 14847
6.7%
14847
6.7%
S 14822
6.7%
u 14822
6.7%
y 14822
6.7%
n 14822
6.7%
r 14822
6.7%
Other values (6) 36024
16.3%
ValueCountFrequency (%)
t 29669
13.4%
o 29669
13.4%
C 21856
9.9%
d 14847
6.7%
14847
6.7%
S 14822
6.7%
u 14822
6.7%
y 14822
6.7%
n 14822
6.7%
r 14822
6.7%
Other values (6) 36024
16.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 221022
100.0%
ValueCountFrequency (%)
(unknown) 221022
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 29669
13.4%
o 29669
13.4%
C 21856
9.9%
d 14847
6.7%
14847
6.7%
S 14822
6.7%
u 14822
6.7%
y 14822
6.7%
n 14822
6.7%
r 14822
6.7%
Other values (6) 36024
16.3%
ValueCountFrequency (%)
t 29669
13.4%
o 29669
13.4%
C 21856
9.9%
d 14847
6.7%
14847
6.7%
S 14822
6.7%
u 14822
6.7%
y 14822
6.7%
n 14822
6.7%
r 14822
6.7%
Other values (6) 36024
16.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 221022
100.0%
ValueCountFrequency (%)
(unknown) 221022
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 29669
13.4%
o 29669
13.4%
C 21856
9.9%
d 14847
6.7%
14847
6.7%
S 14822
6.7%
u 14822
6.7%
y 14822
6.7%
n 14822
6.7%
r 14822
6.7%
Other values (6) 36024
16.3%
ValueCountFrequency (%)
t 29669
13.4%
o 29669
13.4%
C 21856
9.9%
d 14847
6.7%
14847
6.7%
S 14822
6.7%
u 14822
6.7%
y 14822
6.7%
n 14822
6.7%
r 14822
6.7%
Other values (6) 36024
16.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 221022
100.0%
ValueCountFrequency (%)
(unknown) 221022
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 29669
13.4%
o 29669
13.4%
C 21856
9.9%
d 14847
6.7%
14847
6.7%
S 14822
6.7%
u 14822
6.7%
y 14822
6.7%
n 14822
6.7%
r 14822
6.7%
Other values (6) 36024
16.3%
ValueCountFrequency (%)
t 29669
13.4%
o 29669
13.4%
C 21856
9.9%
d 14847
6.7%
14847
6.7%
S 14822
6.7%
u 14822
6.7%
y 14822
6.7%
n 14822
6.7%
r 14822
6.7%
Other values (6) 36024
16.3%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size171.1 KiB
NCIT:C142703
14822 
NCIT:C49152
7034 
NCIT:C69062
 
25

Length

Max length12
Median length12
Mean length11.677391
Min length11

Characters and Unicode

Total characters255513
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNCIT:C142703
2nd rowNCIT:C142703
3rd rowNCIT:C142703
4th rowNCIT:C142703
5th rowNCIT:C142703

Common Values

ValueCountFrequency (%)
NCIT:C142703 14822
67.7%
NCIT:C49152 7034
32.1%
NCIT:C69062 25
 
0.1%

Length

2025-03-30T23:31:22.137630image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ncit:c142703 14822
67.7%
ncit:c49152 7034
32.1%
ncit:c69062 25
 
0.1%

Most occurring characters

ValueCountFrequency (%)
C 43762
17.1%
N 21881
8.6%
I 21881
8.6%
T 21881
8.6%
: 21881
8.6%
2 21881
8.6%
1 21856
8.6%
4 21856
8.6%
0 14847
 
5.8%
7 14822
 
5.8%
Other values (4) 28965
11.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 255513
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 43762
17.1%
N 21881
8.6%
I 21881
8.6%
T 21881
8.6%
: 21881
8.6%
2 21881
8.6%
1 21856
8.6%
4 21856
8.6%
0 14847
 
5.8%
7 14822
 
5.8%
Other values (4) 28965
11.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 255513
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 43762
17.1%
N 21881
8.6%
I 21881
8.6%
T 21881
8.6%
: 21881
8.6%
2 21881
8.6%
1 21856
8.6%
4 21856
8.6%
0 14847
 
5.8%
7 14822
 
5.8%
Other values (4) 28965
11.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 255513
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 43762
17.1%
N 21881
8.6%
I 21881
8.6%
T 21881
8.6%
: 21881
8.6%
2 21881
8.6%
1 21856
8.6%
4 21856
8.6%
0 14847
 
5.8%
7 14822
 
5.8%
Other values (4) 28965
11.3%

target_condition
Categorical

 curated_md_reportconcatenated_md_report
Distinct4747
Distinct (%)0.2%0.2%
Missing00
Missing (%)0.0%0.0%
Memory size171.1 KiB176.6 KiB
human gut microbiome
8306 
Inflammatory Bowel Disease
2282 
abnormal glucose tolerance;Metabolic Syndrome;control;Type 2 Diabetes Mellitus;Heart Failure
1831 
human microbiome
860 
otitis;pneumonia;bronchitis;Respiratory tract infection;sepsis;Skin Infection;Cough;gastroenteritis;Tonsillitis;pyelonephritis;cystitis;Fever;Infection;stomatitis;salmonellosis
 
785
Other values (42)
7817 
human gut microbiome
8306 
Inflammatory Bowel Disease
2637 
abnormal glucose tolerance;Metabolic Syndrome;control;Type 2 Diabetes Mellitus;Heart Failure
1831 
human microbiome
860 
otitis;pneumonia;bronchitis;Respiratory tract infection;sepsis;Skin Infection;Cough;gastroenteritis;Tonsillitis;pyelonephritis;cystitis;Fever;Infection;stomatitis;salmonellosis
 
785
Other values (42)
8169 

Length

 curated_md_reportconcatenated_md_report
Max length176176
Median length9292
Mean length35.06224634.593412
Min length88

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters767197781396
Distinct characters5050
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique00 ?
Unique (%)0.0%0.0%

Sample

 curated_md_reportconcatenated_md_report
1st rowhuman gut microbiomehuman gut microbiome
2nd rowhuman gut microbiomehuman gut microbiome
3rd rowhuman gut microbiomehuman gut microbiome
4th rowhuman gut microbiomehuman gut microbiome
5th rowhuman gut microbiomehuman gut microbiome

Common Values

ValueCountFrequency (%)
human gut microbiome 8306
38.0%
Inflammatory Bowel Disease 2282
 
10.4%
abnormal glucose tolerance;Metabolic Syndrome;control;Type 2 Diabetes Mellitus;Heart Failure 1831
 
8.4%
human microbiome 860
 
3.9%
otitis;pneumonia;bronchitis;Respiratory tract infection;sepsis;Skin Infection;Cough;gastroenteritis;Tonsillitis;pyelonephritis;cystitis;Fever;Infection;stomatitis;salmonellosis 785
 
3.6%
colorectal cancer;Adenoma 616
 
2.8%
colorectal cancer 503
 
2.3%
premature birth 453
 
2.1%
abnormal glucose tolerance;Type 2 Diabetes Mellitus 441
 
2.0%
Type 2 Diabetes Mellitus 400
 
1.8%
Other values (37) 5404
24.7%
ValueCountFrequency (%)
human gut microbiome 8306
36.8%
Inflammatory Bowel Disease 2637
 
11.7%
abnormal glucose tolerance;Metabolic Syndrome;control;Type 2 Diabetes Mellitus;Heart Failure 1831
 
8.1%
human microbiome 860
 
3.8%
otitis;pneumonia;bronchitis;Respiratory tract infection;sepsis;Skin Infection;Cough;gastroenteritis;Tonsillitis;pyelonephritis;cystitis;Fever;Infection;stomatitis;salmonellosis 785
 
3.5%
colorectal cancer;Adenoma 642
 
2.8%
colorectal cancer 503
 
2.2%
Schizophrenia 502
 
2.2%
premature birth 453
 
2.0%
abnormal glucose tolerance;Type 2 Diabetes Mellitus 441
 
2.0%
Other values (37) 5628
24.9%

Length

2025-03-30T23:31:22.246456image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

concatenated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
human 9861
 
13.4%
microbiome 9792
 
13.3%
gut 8710
 
11.8%
diabetes 3365
 
4.6%
disease 3024
 
4.1%
2 2932
 
4.0%
bowel 2801
 
3.8%
inflammatory 2376
 
3.2%
glucose 2272
 
3.1%
abnormal 2272
 
3.1%
Other values (79) 26155
35.6%
ValueCountFrequency (%)
human 9861
 
13.1%
microbiome 9792
 
13.1%
gut 8710
 
11.6%
disease 3379
 
4.5%
diabetes 3365
 
4.5%
bowel 3156
 
4.2%
2 2932
 
3.9%
inflammatory 2731
 
3.6%
abnormal 2272
 
3.0%
glucose 2272
 
3.0%
Other values (79) 26560
35.4%

Most occurring characters

ValueCountFrequency (%)
e 69579
 
9.1%
i 64933
 
8.5%
o 62085
 
8.1%
51679
 
6.7%
a 51318
 
6.7%
t 50326
 
6.6%
m 46460
 
6.1%
r 42986
 
5.6%
n 40443
 
5.3%
l 37182
 
4.8%
Other values (40) 250206
32.6%
ValueCountFrequency (%)
e 71075
 
9.1%
i 65913
 
8.4%
o 63226
 
8.1%
a 52787
 
6.8%
52442
 
6.7%
t 50734
 
6.5%
m 47196
 
6.0%
r 43746
 
5.6%
n 41149
 
5.3%
l 37998
 
4.9%
Other values (40) 255130
32.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 767197
100.0%
ValueCountFrequency (%)
(unknown) 781396
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 69579
 
9.1%
i 64933
 
8.5%
o 62085
 
8.1%
51679
 
6.7%
a 51318
 
6.7%
t 50326
 
6.6%
m 46460
 
6.1%
r 42986
 
5.6%
n 40443
 
5.3%
l 37182
 
4.8%
Other values (40) 250206
32.6%
ValueCountFrequency (%)
e 71075
 
9.1%
i 65913
 
8.4%
o 63226
 
8.1%
a 52787
 
6.8%
52442
 
6.7%
t 50734
 
6.5%
m 47196
 
6.0%
r 43746
 
5.6%
n 41149
 
5.3%
l 37998
 
4.9%
Other values (40) 255130
32.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 767197
100.0%
ValueCountFrequency (%)
(unknown) 781396
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 69579
 
9.1%
i 64933
 
8.5%
o 62085
 
8.1%
51679
 
6.7%
a 51318
 
6.7%
t 50326
 
6.6%
m 46460
 
6.1%
r 42986
 
5.6%
n 40443
 
5.3%
l 37182
 
4.8%
Other values (40) 250206
32.6%
ValueCountFrequency (%)
e 71075
 
9.1%
i 65913
 
8.4%
o 63226
 
8.1%
a 52787
 
6.8%
52442
 
6.7%
t 50734
 
6.5%
m 47196
 
6.0%
r 43746
 
5.6%
n 41149
 
5.3%
l 37998
 
4.9%
Other values (40) 255130
32.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 767197
100.0%
ValueCountFrequency (%)
(unknown) 781396
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 69579
 
9.1%
i 64933
 
8.5%
o 62085
 
8.1%
51679
 
6.7%
a 51318
 
6.7%
t 50326
 
6.6%
m 46460
 
6.1%
r 42986
 
5.6%
n 40443
 
5.3%
l 37182
 
4.8%
Other values (40) 250206
32.6%
ValueCountFrequency (%)
e 71075
 
9.1%
i 65913
 
8.4%
o 63226
 
8.1%
a 52787
 
6.8%
52442
 
6.7%
t 50734
 
6.5%
m 47196
 
6.0%
r 43746
 
5.6%
n 41149
 
5.3%
l 37998
 
4.9%
Other values (40) 255130
32.7%
Distinct47
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size171.1 KiB
OHMI:0000020
8306 
NCIT:C3138
2282 
EFO:0002546;NCIT:C84442;EFO:0001461;NCIT:C26747;NCIT:C50577
1831 
OHMI:0000002
860 
SYMP:0000873;EFO:0003106;EFO:0009661;HP:0011947;MP:0005044;NCIT:C35025;HP:0012735;EFO:1001463;NCIT:C116006;EFO:1001141;EFO:1000025;HP:0001945;NCIT:C128320;EFO:0009688;MONDO:0000827
 
785
Other values (42)
7817 

Length

Max length180
Median length59
Mean length23.773548
Min length9

Characters and Unicode

Total characters520189
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOHMI:0000020
2nd rowOHMI:0000020
3rd rowOHMI:0000020
4th rowOHMI:0000020
5th rowOHMI:0000020

Common Values

ValueCountFrequency (%)
OHMI:0000020 8306
38.0%
NCIT:C3138 2282
 
10.4%
EFO:0002546;NCIT:C84442;EFO:0001461;NCIT:C26747;NCIT:C50577 1831
 
8.4%
OHMI:0000002 860
 
3.9%
SYMP:0000873;EFO:0003106;EFO:0009661;HP:0011947;MP:0005044;NCIT:C35025;HP:0012735;EFO:1001463;NCIT:C116006;EFO:1001141;EFO:1000025;HP:0001945;NCIT:C128320;EFO:0009688;MONDO:0000827 785
 
3.6%
EFO:0005842;NCIT:C2855 616
 
2.8%
EFO:0005842 503
 
2.3%
EFO:0003917 453
 
2.1%
EFO:0002546;NCIT:C26747 441
 
2.0%
NCIT:C26747 400
 
1.8%
Other values (37) 5404
24.7%

Length

2025-03-30T23:31:22.346854image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ohmi:0000020 8306
38.0%
ncit:c3138 2282
 
10.4%
efo:0002546;ncit:c84442;efo:0001461;ncit:c26747;ncit:c50577 1831
 
8.4%
ohmi:0000002 860
 
3.9%
symp:0000873;efo:0003106;efo:0009661;hp:0011947;mp:0005044;ncit:c35025;hp:0012735;efo:1001463;ncit:c116006;efo:1001141;efo:1000025;hp:0001945;ncit:c128320;efo:0009688;mondo:0000827 785
 
3.6%
efo:0005842;ncit:c2855 616
 
2.8%
efo:0005842 503
 
2.3%
efo:0003917 453
 
2.1%
efo:0002546;ncit:c26747 441
 
2.0%
ncit:c26747 400
 
1.8%
Other values (37) 5404
24.7%

Most occurring characters

ValueCountFrequency (%)
0 125912
24.2%
: 44208
 
8.5%
C 30073
 
5.8%
2 27455
 
5.3%
O 25804
 
5.0%
I 25300
 
4.9%
1 23865
 
4.6%
; 22642
 
4.4%
4 20578
 
4.0%
5 17103
 
3.3%
Other values (20) 157249
30.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 520189
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 125912
24.2%
: 44208
 
8.5%
C 30073
 
5.8%
2 27455
 
5.3%
O 25804
 
5.0%
I 25300
 
4.9%
1 23865
 
4.6%
; 22642
 
4.4%
4 20578
 
4.0%
5 17103
 
3.3%
Other values (20) 157249
30.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 520189
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 125912
24.2%
: 44208
 
8.5%
C 30073
 
5.8%
2 27455
 
5.3%
O 25804
 
5.0%
I 25300
 
4.9%
1 23865
 
4.6%
; 22642
 
4.4%
4 20578
 
4.0%
5 17103
 
3.3%
Other values (20) 157249
30.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 520189
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 125912
24.2%
: 44208
 
8.5%
C 30073
 
5.8%
2 27455
 
5.3%
O 25804
 
5.0%
I 25300
 
4.9%
1 23865
 
4.6%
; 22642
 
4.4%
4 20578
 
4.0%
5 17103
 
3.3%
Other values (20) 157249
30.2%

disease
['Text', 'Text']

 curated_md_reportconcatenated_md_report
Distinct206206
Distinct (%)0.9%0.9%
Missing00
Missing (%)0.0%0.0%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:22.601355image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 curated_md_reportconcatenated_md_report
Max length142142
Median length77
Mean length15.54270815.923942
Min length55

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters340090359690
Distinct characters5454
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique5453 ?
Unique (%)0.2%0.2%

Sample

 curated_md_reportconcatenated_md_report
1st rowHealthyHealthy
2nd rowHealthyHealthy
3rd rowHealthyHealthy
4th rowHealthyHealthy
5th rowHealthyHealthy
ValueCountFrequency (%)
healthy 14133
36.6%
bowel 1739
 
4.5%
inflammatory 1736
 
4.5%
diabetes 1397
 
3.6%
mellitus 1319
 
3.4%
disease 1233
 
3.2%
2 1206
 
3.1%
type 1018
 
2.6%
disease;crohn's 952
 
2.5%
disease;ulcerative 741
 
1.9%
Other values (242) 13089
33.9%
ValueCountFrequency (%)
healthy 14432
35.7%
bowel 2094
 
5.2%
inflammatory 2091
 
5.2%
diabetes 1397
 
3.5%
mellitus 1319
 
3.3%
disease 1233
 
3.0%
2 1206
 
3.0%
disease;ulcerative 1096
 
2.7%
colitis 1096
 
2.7%
type 1018
 
2.5%
Other values (242) 13461
33.3%
2025-03-30T23:31:22.994598image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 41089
12.1%
a 34662
 
10.2%
l 29163
 
8.6%
t 29002
 
8.5%
y 19985
 
5.9%
h 18098
 
5.3%
i 17867
 
5.3%
16682
 
4.9%
s 15896
 
4.7%
o 15682
 
4.6%
Other values (44) 101964
30.0%
ValueCountFrequency (%)
e 43297
12.0%
a 36461
 
10.1%
l 30909
 
8.6%
t 30501
 
8.5%
y 20639
 
5.7%
i 19503
 
5.4%
h 18424
 
5.1%
17855
 
5.0%
s 17069
 
4.7%
o 16854
 
4.7%
Other values (44) 108178
30.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 340090
100.0%
ValueCountFrequency (%)
(unknown) 359690
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 41089
12.1%
a 34662
 
10.2%
l 29163
 
8.6%
t 29002
 
8.5%
y 19985
 
5.9%
h 18098
 
5.3%
i 17867
 
5.3%
16682
 
4.9%
s 15896
 
4.7%
o 15682
 
4.6%
Other values (44) 101964
30.0%
ValueCountFrequency (%)
e 43297
12.0%
a 36461
 
10.1%
l 30909
 
8.6%
t 30501
 
8.5%
y 20639
 
5.7%
i 19503
 
5.4%
h 18424
 
5.1%
17855
 
5.0%
s 17069
 
4.7%
o 16854
 
4.7%
Other values (44) 108178
30.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 340090
100.0%
ValueCountFrequency (%)
(unknown) 359690
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 41089
12.1%
a 34662
 
10.2%
l 29163
 
8.6%
t 29002
 
8.5%
y 19985
 
5.9%
h 18098
 
5.3%
i 17867
 
5.3%
16682
 
4.9%
s 15896
 
4.7%
o 15682
 
4.6%
Other values (44) 101964
30.0%
ValueCountFrequency (%)
e 43297
12.0%
a 36461
 
10.1%
l 30909
 
8.6%
t 30501
 
8.5%
y 20639
 
5.7%
i 19503
 
5.4%
h 18424
 
5.1%
17855
 
5.0%
s 17069
 
4.7%
o 16854
 
4.7%
Other values (44) 108178
30.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 340090
100.0%
ValueCountFrequency (%)
(unknown) 359690
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 41089
12.1%
a 34662
 
10.2%
l 29163
 
8.6%
t 29002
 
8.5%
y 19985
 
5.9%
h 18098
 
5.3%
i 17867
 
5.3%
16682
 
4.9%
s 15896
 
4.7%
o 15682
 
4.6%
Other values (44) 101964
30.0%
ValueCountFrequency (%)
e 43297
12.0%
a 36461
 
10.1%
l 30909
 
8.6%
t 30501
 
8.5%
y 20639
 
5.7%
i 19503
 
5.4%
h 18424
 
5.1%
17855
 
5.0%
s 17069
 
4.7%
o 16854
 
4.7%
Other values (44) 108178
30.1%
Distinct206
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size171.1 KiB
2025-03-30T23:31:23.161653image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length67
Median length12
Mean length13.987661
Min length9

Characters and Unicode

Total characters306064
Distinct characters33
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)0.2%

Sample

1st rowNCIT:C115935
2nd rowNCIT:C115935
3rd rowNCIT:C115935
4th rowNCIT:C115935
5th rowNCIT:C115935
ValueCountFrequency (%)
ncit:c115935 14133
64.6%
ncit:c3138;efo:0000384 952
 
4.4%
ncit:c26747 893
 
4.1%
ncit:c3138;efo:0000729 741
 
3.4%
efo:0003917 448
 
2.0%
efo:0005842 442
 
2.0%
ncit:c84442 365
 
1.7%
efo:0002546;ncit:c84442 266
 
1.2%
efo:0002546 265
 
1.2%
efo:0003914 214
 
1.0%
Other values (196) 3162
 
14.5%
2025-03-30T23:31:23.428717image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 38603
12.6%
1 33964
11.1%
5 32532
10.6%
: 26216
8.6%
0 24195
7.9%
3 21922
7.2%
I 19725
 
6.4%
T 19695
 
6.4%
N 19511
 
6.4%
9 17094
 
5.6%
Other values (23) 52607
17.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 306064
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 38603
12.6%
1 33964
11.1%
5 32532
10.6%
: 26216
8.6%
0 24195
7.9%
3 21922
7.2%
I 19725
 
6.4%
T 19695
 
6.4%
N 19511
 
6.4%
9 17094
 
5.6%
Other values (23) 52607
17.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 306064
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 38603
12.6%
1 33964
11.1%
5 32532
10.6%
: 26216
8.6%
0 24195
7.9%
3 21922
7.2%
I 19725
 
6.4%
T 19695
 
6.4%
N 19511
 
6.4%
9 17094
 
5.6%
Other values (23) 52607
17.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 306064
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 38603
12.6%
1 33964
11.1%
5 32532
10.6%
: 26216
8.6%
0 24195
7.9%
3 21922
7.2%
I 19725
 
6.4%
T 19695
 
6.4%
N 19511
 
6.4%
9 17094
 
5.6%
Other values (23) 52607
17.2%
 curated_md_reportconcatenated_md_report
Distinct22
Distinct (%)< 0.1%< 0.1%
Missing73067932
Missing (%)33.4%35.1%
Memory size42.9 KiB44.2 KiB
False
12632 
True
1943 
(Missing)
7306 
False
12713 
True
1943 
(Missing)
7932 
ValueCountFrequency (%)
False 12632
57.7%
True 1943
 
8.9%
(Missing) 7306
33.4%
ValueCountFrequency (%)
False 12713
56.3%
True 1943
 
8.6%
(Missing) 7932
35.1%

curated_md_report

2025-03-30T23:31:23.471516image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:23.527706image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

treatment
['Text', 'Text']

 curated_md_reportconcatenated_md_report
Distinct947947
Distinct (%)40.3%40.3%
Missing1953420241
Missing (%)89.3%89.6%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:23.871011image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 curated_md_reportconcatenated_md_report
Max length364364
Median length259259
Mean length85.18065685.180656
Min length44

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters199919199919
Distinct characters5555
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique730730 ?
Unique (%)31.1%31.1%

Sample

 curated_md_reportconcatenated_md_report
1st rowsecond generation antipsychoticsecond generation antipsychotic
2nd rowsecond generation antipsychoticsecond generation antipsychotic
3rd rowsecond generation antipsychoticsecond generation antipsychotic
4th rowsecond generation antipsychoticsecond generation antipsychotic
5th rowDopamine AntagonistDopamine Antagonist
ValueCountFrequency (%)
antilipidemic 707
 
5.8%
agent;antihypertensive 607
 
5.0%
antihypertensive 492
 
4.0%
receptor 469
 
3.8%
agents;anti-diabetic 454
 
3.7%
inhibitor 441
 
3.6%
pump 420
 
3.4%
antibiotic 406
 
3.3%
ii 403
 
3.3%
enzyme 377
 
3.1%
Other values (591) 7444
60.9%
ValueCountFrequency (%)
antilipidemic 707
 
5.8%
agent;antihypertensive 607
 
5.0%
antihypertensive 492
 
4.0%
receptor 469
 
3.8%
agents;anti-diabetic 454
 
3.7%
inhibitor 441
 
3.6%
pump 420
 
3.4%
antibiotic 406
 
3.3%
ii 403
 
3.3%
enzyme 377
 
3.1%
Other values (591) 7444
60.9%
2025-03-30T23:31:24.302518image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 24625
 
12.3%
n 20507
 
10.3%
t 19733
 
9.9%
e 17544
 
8.8%
9873
 
4.9%
o 9208
 
4.6%
A 9195
 
4.6%
r 8798
 
4.4%
s 7648
 
3.8%
; 7392
 
3.7%
Other values (45) 65396
32.7%
ValueCountFrequency (%)
i 24625
 
12.3%
n 20507
 
10.3%
t 19733
 
9.9%
e 17544
 
8.8%
9873
 
4.9%
o 9208
 
4.6%
A 9195
 
4.6%
r 8798
 
4.4%
s 7648
 
3.8%
; 7392
 
3.7%
Other values (45) 65396
32.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 199919
100.0%
ValueCountFrequency (%)
(unknown) 199919
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 24625
 
12.3%
n 20507
 
10.3%
t 19733
 
9.9%
e 17544
 
8.8%
9873
 
4.9%
o 9208
 
4.6%
A 9195
 
4.6%
r 8798
 
4.4%
s 7648
 
3.8%
; 7392
 
3.7%
Other values (45) 65396
32.7%
ValueCountFrequency (%)
i 24625
 
12.3%
n 20507
 
10.3%
t 19733
 
9.9%
e 17544
 
8.8%
9873
 
4.9%
o 9208
 
4.6%
A 9195
 
4.6%
r 8798
 
4.4%
s 7648
 
3.8%
; 7392
 
3.7%
Other values (45) 65396
32.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 199919
100.0%
ValueCountFrequency (%)
(unknown) 199919
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 24625
 
12.3%
n 20507
 
10.3%
t 19733
 
9.9%
e 17544
 
8.8%
9873
 
4.9%
o 9208
 
4.6%
A 9195
 
4.6%
r 8798
 
4.4%
s 7648
 
3.8%
; 7392
 
3.7%
Other values (45) 65396
32.7%
ValueCountFrequency (%)
i 24625
 
12.3%
n 20507
 
10.3%
t 19733
 
9.9%
e 17544
 
8.8%
9873
 
4.9%
o 9208
 
4.6%
A 9195
 
4.6%
r 8798
 
4.4%
s 7648
 
3.8%
; 7392
 
3.7%
Other values (45) 65396
32.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 199919
100.0%
ValueCountFrequency (%)
(unknown) 199919
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 24625
 
12.3%
n 20507
 
10.3%
t 19733
 
9.9%
e 17544
 
8.8%
9873
 
4.9%
o 9208
 
4.6%
A 9195
 
4.6%
r 8798
 
4.4%
s 7648
 
3.8%
; 7392
 
3.7%
Other values (45) 65396
32.7%
ValueCountFrequency (%)
i 24625
 
12.3%
n 20507
 
10.3%
t 19733
 
9.9%
e 17544
 
8.8%
9873
 
4.9%
o 9208
 
4.6%
A 9195
 
4.6%
r 8798
 
4.4%
s 7648
 
3.8%
; 7392
 
3.7%
Other values (45) 65396
32.7%
Distinct948
Distinct (%)16.3%
Missing16053
Missing (%)73.4%
Memory size171.1 KiB
2025-03-30T23:31:24.445452image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length198
Median length11
Mean length25.919355
Min length9

Characters and Unicode

Total characters151058
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique730 ?
Unique (%)12.5%

Sample

1st rowNCIT:C41132
2nd rowNCIT:C41132
3rd rowNCIT:C41132
4th rowNCIT:C41132
5th rowNCIT:C41132
ValueCountFrequency (%)
ncit:c41132 3481
59.7%
ncit:c1500 87
 
1.5%
ncit:c61612 84
 
1.4%
ncit:c29723 68
 
1.2%
ncit:c843 55
 
0.9%
ncit:c41132;ncit:c357 52
 
0.9%
ncit:c357 52
 
0.9%
ncit:c1500;ncit:c2363 49
 
0.8%
chebi:87631 47
 
0.8%
ncit:c257 43
 
0.7%
Other values (938) 1810
31.1%
2025-03-30T23:31:24.720728image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 23010
15.2%
: 13181
 
8.7%
1 12065
 
8.0%
I 11673
 
7.7%
0 10852
 
7.2%
T 10796
 
7.1%
N 10314
 
6.8%
2 9127
 
6.0%
3 7824
 
5.2%
; 7392
 
4.9%
Other values (16) 34824
23.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 151058
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 23010
15.2%
: 13181
 
8.7%
1 12065
 
8.0%
I 11673
 
7.7%
0 10852
 
7.2%
T 10796
 
7.1%
N 10314
 
6.8%
2 9127
 
6.0%
3 7824
 
5.2%
; 7392
 
4.9%
Other values (16) 34824
23.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 151058
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 23010
15.2%
: 13181
 
8.7%
1 12065
 
8.0%
I 11673
 
7.7%
0 10852
 
7.2%
T 10796
 
7.1%
N 10314
 
6.8%
2 9127
 
6.0%
3 7824
 
5.2%
; 7392
 
4.9%
Other values (16) 34824
23.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 151058
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 23010
15.2%
: 13181
 
8.7%
1 12065
 
8.0%
I 11673
 
7.7%
0 10852
 
7.2%
T 10796
 
7.1%
N 10314
 
6.8%
2 9127
 
6.0%
3 7824
 
5.2%
; 7392
 
4.9%
Other values (16) 34824
23.1%
 curated_md_reportconcatenated_md_report
Distinct66
Distinct (%)1.0%1.0%
Missing2125221959
Missing (%)97.1%97.2%
Memory size171.1 KiB176.6 KiB
I
161 
III
127 
0
113 
II
95 
IV
93 
I
161 
III
127 
0
113 
II
95 
IV
93 

Length

 curated_md_reportconcatenated_md_report
Max length66
Median length33
Mean length2.02066772.0206677
Min length11

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters12711271
Distinct characters44
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique00 ?
Unique (%)0.0%0.0%

Sample

 curated_md_reportconcatenated_md_report
1st rowIIIIII
2nd rowII
3rd rowIIIIII
4th rowII
5th rowIIII

Common Values

ValueCountFrequency (%)
I 161
 
0.7%
III 127
 
0.6%
0 113
 
0.5%
II 95
 
0.4%
IV 93
 
0.4%
III/IV 40
 
0.2%
(Missing) 21252
97.1%
ValueCountFrequency (%)
I 161
 
0.7%
III 127
 
0.6%
0 113
 
0.5%
II 95
 
0.4%
IV 93
 
0.4%
III/IV 40
 
0.2%
(Missing) 21959
97.2%

Length

2025-03-30T23:31:24.787895image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report

2025-03-30T23:31:24.838880image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:24.907050image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
i 161
25.6%
iii 127
20.2%
0 113
18.0%
ii 95
15.1%
iv 93
14.8%
iii/iv 40
 
6.4%
ValueCountFrequency (%)
i 161
25.6%
iii 127
20.2%
0 113
18.0%
ii 95
15.1%
iv 93
14.8%
iii/iv 40
 
6.4%

Most occurring characters

ValueCountFrequency (%)
I 985
77.5%
V 133
 
10.5%
0 113
 
8.9%
/ 40
 
3.1%
ValueCountFrequency (%)
I 985
77.5%
V 133
 
10.5%
0 113
 
8.9%
/ 40
 
3.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1271
100.0%
ValueCountFrequency (%)
(unknown) 1271
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
I 985
77.5%
V 133
 
10.5%
0 113
 
8.9%
/ 40
 
3.1%
ValueCountFrequency (%)
I 985
77.5%
V 133
 
10.5%
0 113
 
8.9%
/ 40
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1271
100.0%
ValueCountFrequency (%)
(unknown) 1271
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
I 985
77.5%
V 133
 
10.5%
0 113
 
8.9%
/ 40
 
3.1%
ValueCountFrequency (%)
I 985
77.5%
V 133
 
10.5%
0 113
 
8.9%
/ 40
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1271
100.0%
ValueCountFrequency (%)
(unknown) 1271
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
I 985
77.5%
V 133
 
10.5%
0 113
 
8.9%
/ 40
 
3.1%
ValueCountFrequency (%)
I 985
77.5%
V 133
 
10.5%
0 113
 
8.9%
/ 40
 
3.1%
 curated_md_reportconcatenated_md_report
Distinct2424
Distinct (%)9.2%9.2%
Missing2161922326
Missing (%)98.8%98.8%
Memory size171.1 KiB176.6 KiB
t3n0m0
57 
t1n0m0
39 
t2n0m0
36 
t3n1m0
29 
t3n2m0
17 
Other values (19)
84 
t3n0m0
57 
t1n0m0
39 
t2n0m0
36 
t3n1m0
29 
t3n2m0
17 
Other values (19)
84 

Length

 curated_md_reportconcatenated_md_report
Max length77
Median length66
Mean length5.95801535.9580153
Min length44

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters15611561
Distinct characters1212
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique66 ?
Unique (%)2.3%2.3%

Sample

 curated_md_reportconcatenated_md_report
1st rowt1n0m0t1n0m0
2nd rowt3n0m0t3n0m0
3rd rowt4n0m0t4n0m0
4th rowt3n0m0t3n0m0
5th rowptisptis

Common Values

ValueCountFrequency (%)
t3n0m0 57
 
0.3%
t1n0m0 39
 
0.2%
t2n0m0 36
 
0.2%
t3n1m0 29
 
0.1%
t3n2m0 17
 
0.1%
t4n1m0 14
 
0.1%
t3n1m1 14
 
0.1%
t4n1m1 10
 
< 0.1%
ptis 7
 
< 0.1%
t2n1m0 6
 
< 0.1%
Other values (14) 33
 
0.2%
(Missing) 21619
98.8%
ValueCountFrequency (%)
t3n0m0 57
 
0.3%
t1n0m0 39
 
0.2%
t2n0m0 36
 
0.2%
t3n1m0 29
 
0.1%
t3n2m0 17
 
0.1%
t4n1m0 14
 
0.1%
t3n1m1 14
 
0.1%
t4n1m1 10
 
< 0.1%
ptis 7
 
< 0.1%
t2n1m0 6
 
< 0.1%
Other values (14) 33
 
0.1%
(Missing) 22326
98.8%

Length

2025-03-30T23:31:24.987279image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

curated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

concatenated_md_report


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
t3n0m0 57
21.8%
t1n0m0 39
14.9%
t2n0m0 36
13.7%
t3n1m0 29
11.1%
t3n2m0 17
 
6.5%
t4n1m0 14
 
5.3%
t3n1m1 14
 
5.3%
t4n1m1 10
 
3.8%
ptis 7
 
2.7%
t2n1m0 6
 
2.3%
Other values (14) 33
12.6%
ValueCountFrequency (%)
t3n0m0 57
21.8%
t1n0m0 39
14.9%
t2n0m0 36
13.7%
t3n1m0 29
11.1%
t3n2m0 17
 
6.5%
t4n1m0 14
 
5.3%
t3n1m1 14
 
5.3%
t4n1m1 10
 
3.8%
ptis 7
 
2.7%
t2n1m0 6
 
2.3%
Other values (14) 33
12.6%

Most occurring characters

ValueCountFrequency (%)
0 361
23.1%
t 262
16.8%
n 255
16.3%
m 255
16.3%
1 157
10.1%
3 129
 
8.3%
2 76
 
4.9%
4 38
 
2.4%
i 10
 
0.6%
s 10
 
0.6%
Other values (2) 8
 
0.5%
ValueCountFrequency (%)
0 361
23.1%
t 262
16.8%
n 255
16.3%
m 255
16.3%
1 157
10.1%
3 129
 
8.3%
2 76
 
4.9%
4 38
 
2.4%
i 10
 
0.6%
s 10
 
0.6%
Other values (2) 8
 
0.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1561
100.0%
ValueCountFrequency (%)
(unknown) 1561
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 361
23.1%
t 262
16.8%
n 255
16.3%
m 255
16.3%
1 157
10.1%
3 129
 
8.3%
2 76
 
4.9%
4 38
 
2.4%
i 10
 
0.6%
s 10
 
0.6%
Other values (2) 8
 
0.5%
ValueCountFrequency (%)
0 361
23.1%
t 262
16.8%
n 255
16.3%
m 255
16.3%
1 157
10.1%
3 129
 
8.3%
2 76
 
4.9%
4 38
 
2.4%
i 10
 
0.6%
s 10
 
0.6%
Other values (2) 8
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1561
100.0%
ValueCountFrequency (%)
(unknown) 1561
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 361
23.1%
t 262
16.8%
n 255
16.3%
m 255
16.3%
1 157
10.1%
3 129
 
8.3%
2 76
 
4.9%
4 38
 
2.4%
i 10
 
0.6%
s 10
 
0.6%
Other values (2) 8
 
0.5%
ValueCountFrequency (%)
0 361
23.1%
t 262
16.8%
n 255
16.3%
m 255
16.3%
1 157
10.1%
3 129
 
8.3%
2 76
 
4.9%
4 38
 
2.4%
i 10
 
0.6%
s 10
 
0.6%
Other values (2) 8
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1561
100.0%
ValueCountFrequency (%)
(unknown) 1561
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 361
23.1%
t 262
16.8%
n 255
16.3%
m 255
16.3%
1 157
10.1%
3 129
 
8.3%
2 76
 
4.9%
4 38
 
2.4%
i 10
 
0.6%
s 10
 
0.6%
Other values (2) 8
 
0.5%
ValueCountFrequency (%)
0 361
23.1%
t 262
16.8%
n 255
16.3%
m 255
16.3%
1 157
10.1%
3 129
 
8.3%
2 76
 
4.9%
4 38
 
2.4%
i 10
 
0.6%
s 10
 
0.6%
Other values (2) 8
 
0.5%

unmetadata
['Text', 'Text']

 curated_md_reportconcatenated_md_report
Distinct231231
Distinct (%)11.2%11.2%
Missing1981020517
Missing (%)90.5%90.8%
Memory size171.1 KiB176.6 KiB
2025-03-30T23:31:25.473459image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 curated_md_reportconcatenated_md_report
Max length141141
Median length7878
Mean length78.08401778.084017
Min length77

Characters and Unicode

 curated_md_reportconcatenated_md_report
Total characters161712161712
Distinct characters5757
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 curated_md_reportconcatenated_md_report
Unique182182 ?
Unique (%)8.8%8.8%

Sample

 curated_md_reportconcatenated_md_report
1st rowtravel_destination:CMRtravel_destination:CMR
2nd rowtravel_destination:CMRtravel_destination:CMR
3rd rowtravel_destination:CMRtravel_destination:CMR
4th rowtravel_destination:CMRtravel_destination:CMR
5th rowtravel_destination:CMRtravel_destination:CMR
ValueCountFrequency (%)
uncurated_metadata:no_immuno_suppressive;no_t2d;no_t1d;no_related_treatments;no_psychiatric_diseases;no_gastro_intestinal_disorder;non_celiac 900
40.0%
uncurated_metadata:no_infection;no_cancer 250
 
11.1%
fobt:no 121
 
5.4%
uncurated_metadata:low_gluten_diet 104
 
4.6%
uncurated_metadata:high_gluten_diet 103
 
4.6%
uncurated_metadata:no_diabetes;non_celiac;no_gi_diseases 97
 
4.3%
fobt:yes 64
 
2.8%
given 45
 
2.0%
as 45
 
2.0%
30 45
 
2.0%
Other values (225) 477
21.2%
ValueCountFrequency (%)
uncurated_metadata:no_immuno_suppressive;no_t2d;no_t1d;no_related_treatments;no_psychiatric_diseases;no_gastro_intestinal_disorder;non_celiac 900
40.0%
uncurated_metadata:no_infection;no_cancer 250
 
11.1%
fobt:no 121
 
5.4%
uncurated_metadata:low_gluten_diet 104
 
4.6%
uncurated_metadata:high_gluten_diet 103
 
4.6%
uncurated_metadata:no_diabetes;non_celiac;no_gi_diseases 97
 
4.3%
fobt:yes 64
 
2.8%
given 45
 
2.0%
as 45
 
2.0%
30 45
 
2.0%
Other values (225) 477
21.2%
2025-03-30T23:31:26.366198image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 15081
 
9.3%
n 14915
 
9.2%
_ 13971
 
8.6%
a 13463
 
8.3%
t 13088
 
8.1%
o 10878
 
6.7%
s 10718
 
6.6%
i 9749
 
6.0%
r 8396
 
5.2%
d 7135
 
4.4%
Other values (47) 44318
27.4%
ValueCountFrequency (%)
e 15081
 
9.3%
n 14915
 
9.2%
_ 13971
 
8.6%
a 13463
 
8.3%
t 13088
 
8.1%
o 10878
 
6.7%
s 10718
 
6.6%
i 9749
 
6.0%
r 8396
 
5.2%
d 7135
 
4.4%
Other values (47) 44318
27.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 161712
100.0%
ValueCountFrequency (%)
(unknown) 161712
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 15081
 
9.3%
n 14915
 
9.2%
_ 13971
 
8.6%
a 13463
 
8.3%
t 13088
 
8.1%
o 10878
 
6.7%
s 10718
 
6.6%
i 9749
 
6.0%
r 8396
 
5.2%
d 7135
 
4.4%
Other values (47) 44318
27.4%
ValueCountFrequency (%)
e 15081
 
9.3%
n 14915
 
9.2%
_ 13971
 
8.6%
a 13463
 
8.3%
t 13088
 
8.1%
o 10878
 
6.7%
s 10718
 
6.6%
i 9749
 
6.0%
r 8396
 
5.2%
d 7135
 
4.4%
Other values (47) 44318
27.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 161712
100.0%
ValueCountFrequency (%)
(unknown) 161712
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 15081
 
9.3%
n 14915
 
9.2%
_ 13971
 
8.6%
a 13463
 
8.3%
t 13088
 
8.1%
o 10878
 
6.7%
s 10718
 
6.6%
i 9749
 
6.0%
r 8396
 
5.2%
d 7135
 
4.4%
Other values (47) 44318
27.4%
ValueCountFrequency (%)
e 15081
 
9.3%
n 14915
 
9.2%
_ 13971
 
8.6%
a 13463
 
8.3%
t 13088
 
8.1%
o 10878
 
6.7%
s 10718
 
6.6%
i 9749
 
6.0%
r 8396
 
5.2%
d 7135
 
4.4%
Other values (47) 44318
27.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 161712
100.0%
ValueCountFrequency (%)
(unknown) 161712
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 15081
 
9.3%
n 14915
 
9.2%
_ 13971
 
8.6%
a 13463
 
8.3%
t 13088
 
8.1%
o 10878
 
6.7%
s 10718
 
6.6%
i 9749
 
6.0%
r 8396
 
5.2%
d 7135
 
4.4%
Other values (47) 44318
27.4%
ValueCountFrequency (%)
e 15081
 
9.3%
n 14915
 
9.2%
_ 13971
 
8.6%
a 13463
 
8.3%
t 13088
 
8.1%
o 10878
 
6.7%
s 10718
 
6.6%
i 9749
 
6.0%
r 8396
 
5.2%
d 7135
 
4.4%
Other values (47) 44318
27.4%
 curated_md_reportconcatenated_md_report
Distinct22
Distinct (%)< 0.1%< 0.1%
Missing00
Missing (%)0.0%0.0%
Memory size21.5 KiB22.2 KiB
True
20626 
False
 
1255
True
21333 
False
 
1255
ValueCountFrequency (%)
True 20626
94.3%
False 1255
 
5.7%
ValueCountFrequency (%)
True 21333
94.4%
False 1255
 
5.6%

curated_md_report

2025-03-30T23:31:26.413667image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:26.441886image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Interactions

curated_md_report

2025-03-30T23:31:09.280613image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:13.728721image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

curated_md_report

2025-03-30T23:31:08.830098image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:13.149887image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

curated_md_report

2025-03-30T23:31:09.062590image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:13.449468image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

curated_md_report

2025-03-30T23:31:09.342739image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:13.796497image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

curated_md_report

2025-03-30T23:31:08.919136image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:13.321760image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

curated_md_report

2025-03-30T23:31:09.137076image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:13.532528image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

curated_md_report

2025-03-30T23:31:09.408266image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:13.881493image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

curated_md_report

2025-03-30T23:31:08.995941image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:13.386384image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

curated_md_report

2025-03-30T23:31:09.210592image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:13.624779image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

curated_md_report

2025-03-30T23:31:26.503553image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

concatenated_md_report

2025-03-30T23:31:26.731145image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

curated_md_report

age_groupage_group_ontology_term_idage_maxage_minage_yearsantibiotics_current_usebody_sitebody_site_ontology_term_idcontrolcontrol_ontology_term_idcountrycountry_ontology_term_iddietary_restrictionfeces_phenotype_metricfeces_phenotype_metric_ontology_term_idfmt_idfmt_rolehlahla_ontology_term_idsexsex_ontology_term_idsmokersmoker_ontology_term_idtarget_conditiontarget_condition_ontology_term_idtumor_staging_ajcctumor_staging_tnmwesternized
age_group1.0001.0000.7110.7120.8440.1470.0990.0990.1750.1750.4310.4310.3320.4150.4150.7710.1030.1450.1450.0600.0600.0580.0580.4960.4960.0910.1460.150
age_group_ontology_term_id1.0001.0000.7110.7120.8440.1470.0990.0990.1750.1750.4310.4310.3320.4150.4150.7710.1030.1450.1450.0600.0600.0580.0580.4960.4960.0910.1460.150
age_max0.7110.7111.0000.4961.0000.1850.1570.1570.1550.1550.3550.3550.4570.4800.4800.7710.1031.0001.0000.0820.0820.2340.2340.4280.4280.1530.1460.166
age_min0.7120.7120.4961.0001.0000.2220.1730.1730.1680.1680.3360.3360.4770.3310.3310.7900.0541.0001.0000.0980.0980.2380.2380.4180.4180.1780.1610.174
age_years0.8440.8441.0001.0001.0000.2870.1490.1490.2850.2850.3150.3150.4721.0001.0000.0000.0001.0001.0000.1110.1110.4160.4160.4160.4160.0910.1610.134
antibiotics_current_use0.1470.1470.1850.2220.2871.0000.1100.1100.1960.1960.4580.4580.0000.2950.2950.8120.4000.1590.1590.0110.0110.3840.3840.5300.5300.0641.0000.061
body_site0.0990.0990.1570.1730.1490.1101.0001.0000.1480.1480.2890.2891.0001.0001.0001.0001.0001.0001.0000.0980.0980.2890.2890.4190.4191.0001.0000.304
body_site_ontology_term_id0.0990.0990.1570.1730.1490.1101.0001.0000.1480.1480.2890.2891.0001.0001.0001.0001.0001.0001.0000.0980.0980.2890.2890.4190.4191.0001.0000.304
control0.1750.1750.1550.1680.2850.1960.1480.1481.0001.0000.3470.3470.3850.4140.4140.8360.4730.3550.3550.0550.0550.3580.3580.5330.5330.5271.0000.124
control_ontology_term_id0.1750.1750.1550.1680.2850.1960.1480.1481.0001.0000.3470.3470.3850.4140.4140.8360.4730.3550.3550.0550.0550.3580.3580.5330.5330.5271.0000.124
country0.4310.4310.3550.3360.3150.4580.2890.2890.3470.3471.0001.0000.5450.7870.7871.0001.0000.4420.4420.1950.1950.5430.5430.5520.5520.2430.3650.976
country_ontology_term_id0.4310.4310.3550.3360.3150.4580.2890.2890.3470.3471.0001.0000.5450.7870.7871.0001.0000.4420.4420.1950.1950.5430.5430.5520.5520.2430.3650.976
dietary_restriction0.3320.3320.4570.4770.4720.0001.0001.0000.3850.3850.5450.5451.0000.0000.0000.0000.0000.0000.0000.1460.1461.0001.0000.5450.5451.0000.0001.000
feces_phenotype_metric0.4150.4150.4800.3311.0000.2951.0001.0000.4140.4140.7870.7870.0001.0001.0000.0000.0000.0000.0000.0000.0001.0001.0000.7880.7880.0000.0001.000
feces_phenotype_metric_ontology_term_id0.4150.4150.4800.3311.0000.2951.0001.0000.4140.4140.7870.7870.0001.0001.0000.0000.0000.0000.0000.0000.0001.0001.0000.7880.7880.0000.0001.000
fmt_id0.7710.7710.7710.7900.0000.8121.0001.0000.8360.8361.0001.0000.0000.0000.0001.0000.3400.0000.0000.0000.0000.0000.0001.0001.0000.0000.0001.000
fmt_role0.1030.1030.1030.0540.0000.4001.0001.0000.4730.4731.0001.0000.0000.0000.0000.3401.0000.0000.0000.0000.0000.0000.0001.0001.0000.0000.0001.000
hla0.1450.1451.0001.0001.0000.1591.0001.0000.3550.3550.4420.4420.0000.0000.0000.0000.0001.0001.0000.5150.5150.0000.0000.9810.9810.0000.0001.000
hla_ontology_term_id0.1450.1451.0001.0001.0000.1591.0001.0000.3550.3550.4420.4420.0000.0000.0000.0000.0001.0001.0000.5150.5150.0000.0000.9810.9810.0000.0001.000
sex0.0600.0600.0820.0980.1110.0110.0980.0980.0550.0550.1950.1950.1460.0000.0000.0000.0000.5150.5151.0001.0000.1570.1570.1820.1820.0000.0000.003
sex_ontology_term_id0.0600.0600.0820.0980.1110.0110.0980.0980.0550.0550.1950.1950.1460.0000.0000.0000.0000.5150.5151.0001.0000.1570.1570.1820.1820.0000.0000.003
smoker0.0580.0580.2340.2380.4160.3840.2890.2890.3580.3580.5430.5431.0001.0001.0000.0000.0000.0000.0000.1570.1571.0001.0000.5600.5600.0000.0000.091
smoker_ontology_term_id0.0580.0580.2340.2380.4160.3840.2890.2890.3580.3580.5430.5431.0001.0001.0000.0000.0000.0000.0000.1570.1571.0001.0000.5600.5600.0000.0000.091
target_condition0.4960.4960.4280.4180.4160.5300.4190.4190.5330.5330.5520.5520.5450.7880.7881.0001.0000.9810.9810.1820.1820.5600.5601.0001.0000.3150.4220.712
target_condition_ontology_term_id0.4960.4960.4280.4180.4160.5300.4190.4190.5330.5330.5520.5520.5450.7880.7881.0001.0000.9810.9810.1820.1820.5600.5601.0001.0000.3150.4220.712
tumor_staging_ajcc0.0910.0910.1530.1780.0910.0641.0001.0000.5270.5270.2430.2431.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.3150.3151.0000.9431.000
tumor_staging_tnm0.1460.1460.1460.1610.1611.0001.0001.0001.0001.0000.3650.3650.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.4220.4220.9431.0001.000
westernized0.1500.1500.1660.1740.1340.0610.3040.3040.1240.1240.9760.9761.0001.0001.0001.0001.0001.0001.0000.0030.0030.0910.0910.7120.7121.0001.0001.000

concatenated_md_report

age_groupage_maxage_minage_yearsantibiotics_current_usebody_sitecontrolcountrydietary_restrictionfeces_phenotype_metricfmt_idsexsmokertarget_conditiontumor_staging_ajcctumor_staging_tnmwesternized
age_group1.0000.7110.7120.8440.1460.0970.1750.4280.3320.4150.7710.0650.0580.4850.0910.1460.151
age_max0.7111.0000.4961.0000.1840.1570.1550.3540.4570.4800.7710.0820.2340.4260.1530.1460.166
age_min0.7120.4961.0001.0000.2220.1730.1680.3350.4770.3310.7900.0980.2380.4170.1780.1610.174
age_years0.8441.0001.0001.0000.2870.1490.2850.3130.4721.0000.0000.1100.4160.4140.0910.1610.134
antibiotics_current_use0.1460.1840.2220.2871.0000.1090.1960.4490.0000.2950.8120.0100.3840.5310.0641.0000.061
body_site0.0970.1570.1730.1490.1091.0000.1480.2791.0001.0001.0000.0950.2890.4191.0001.0000.303
control0.1750.1550.1680.2850.1960.1481.0000.3470.3850.4140.8360.0550.3580.5330.5271.0000.124
country0.4280.3540.3350.3130.4490.2790.3471.0000.5450.7871.0000.1990.5430.5410.2430.3650.911
dietary_restriction0.3320.4570.4770.4720.0001.0000.3850.5451.0000.0000.0000.1461.0000.5451.0000.0001.000
feces_phenotype_metric0.4150.4800.3311.0000.2951.0000.4140.7870.0001.0000.0000.0001.0000.7880.0000.0001.000
fmt_id0.7710.7710.7900.0000.8121.0000.8361.0000.0000.0001.0000.0000.0001.0000.0000.0001.000
sex0.0650.0820.0980.1100.0100.0950.0550.1990.1460.0000.0001.0000.1570.1840.0000.0000.000
smoker0.0580.2340.2380.4160.3840.2890.3580.5431.0001.0000.0000.1571.0000.5600.0000.0000.091
target_condition0.4850.4260.4170.4140.5310.4190.5330.5410.5450.7881.0000.1840.5601.0000.3150.4220.707
tumor_staging_ajcc0.0910.1530.1780.0910.0641.0000.5270.2431.0000.0000.0000.0000.0000.3151.0000.9431.000
tumor_staging_tnm0.1460.1460.1610.1611.0001.0001.0000.3650.0000.0000.0000.0000.0000.4220.9431.0001.000
westernized0.1510.1660.1740.1340.0610.3030.1240.9111.0001.0001.0000.0000.0910.7071.0001.0001.000

Missing values

curated_md_report

2025-03-30T23:31:09.614527image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

concatenated_md_report

2025-03-30T23:31:14.216354image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

curated_md_report

2025-03-30T23:31:10.023372image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

concatenated_md_report

2025-03-30T23:31:14.418983image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

curated_md_report

2025-03-30T23:31:10.587294image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

concatenated_md_report

2025-03-30T23:31:14.698071image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

curated_md_report

study_namesample_idage_yearsage_minage_maxage_groupage_group_ontology_term_idbiomarkerbody_sitebody_site_ontology_term_idcountrycountry_ontology_term_iddietary_restrictionfeces_phenotype_metricfeces_phenotype_valuefeces_phenotype_metric_ontology_term_idfmt_rolefmt_idsexsex_ontology_term_idhlahla_ontology_term_idsmokersmoker_ontology_term_idcontrolcontrol_ontology_term_idtarget_conditiontarget_condition_ontology_term_iddiseasedisease_ontology_term_idantibiotics_current_usetreatmenttreatment_ontology_term_idtumor_staging_ajcctumor_staging_tnmunmetadatawesternized
0AsnicarF_2017MV_FEI1_t1Q140.2465750.2465750.246575InfantNCIT:C27956NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNFemaleNCIT:C16576NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes
1AsnicarF_2017MV_FEI2_t1Q140.2465750.2465750.246575InfantNCIT:C27956NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes
2AsnicarF_2017MV_FEI3_t1Q140.2465750.2465750.246575InfantNCIT:C27956NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes
3AsnicarF_2017MV_FEI4_t1Q141.0000001.0000001.000000InfantNCIT:C27956NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes
4AsnicarF_2017MV_FEI4_t2Q151.0000001.0000001.000000InfantNCIT:C27956NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes
5AsnicarF_2017MV_FEI5_t1Q141.0000001.0000001.000000InfantNCIT:C27956NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes
6AsnicarF_2017MV_FEI5_t2Q141.0000001.0000001.000000InfantNCIT:C27956NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes
7AsnicarF_2017MV_FEI5_t3Q151.0000001.0000001.000000InfantNCIT:C27956NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes
8AsnicarF_2017MV_FEM1_t1Q14NaN18.00000065.000000AdultNCIT:C49685NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNFemaleNCIT:C16576NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes
9AsnicarF_2017MV_FEM2_t1Q14NaN18.00000065.000000AdultNCIT:C49685NaNfecesUBERON:0001988ItalyNCIT:C16761NaNNaNNaNNaNNaNNaNFemaleNCIT:C16576NaNNaNNaNNaNStudy ControlNCIT:C142703human gut microbiomeOHMI:0000020HealthyNCIT:C115935NaNNaNNaNNaNNaNNaNYes

concatenated_md_report

study_namesample_idage_yearsage_minage_maxage_groupbiomarkerbody_sitecountrydietary_restrictionfeces_phenotype_metricfeces_phenotype_valuefmt_idsexsmokercontroltarget_conditiondiseaseantibiotics_current_usetreatmenttumor_staging_ajcctumor_staging_tnmunmetadatawesternized
0AsnicarF_2017MV_FEI1_t1Q140.2465750.2465750.246575InfantNaNfecesItalyNaNNaNNaNNaNFemaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes
1AsnicarF_2017MV_FEI2_t1Q140.2465750.2465750.246575InfantNaNfecesItalyNaNNaNNaNNaNMaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes
2AsnicarF_2017MV_FEI3_t1Q140.2465750.2465750.246575InfantNaNfecesItalyNaNNaNNaNNaNMaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes
3AsnicarF_2017MV_FEI4_t1Q141.0000001.0000001.000000InfantNaNfecesItalyNaNNaNNaNNaNMaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes
4AsnicarF_2017MV_FEI4_t2Q151.0000001.0000001.000000InfantNaNfecesItalyNaNNaNNaNNaNMaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes
5AsnicarF_2017MV_FEI5_t1Q141.0000001.0000001.000000InfantNaNfecesItalyNaNNaNNaNNaNMaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes
6AsnicarF_2017MV_FEI5_t2Q141.0000001.0000001.000000InfantNaNfecesItalyNaNNaNNaNNaNMaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes
7AsnicarF_2017MV_FEI5_t3Q151.0000001.0000001.000000InfantNaNfecesItalyNaNNaNNaNNaNMaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes
8AsnicarF_2017MV_FEM1_t1Q14NaN18.00000065.000000AdultNaNfecesItalyNaNNaNNaNNaNFemaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes
9AsnicarF_2017MV_FEM2_t1Q14NaN18.00000065.000000AdultNaNfecesItalyNaNNaNNaNNaNFemaleNaNStudy Controlhuman gut microbiomeHealthyNaNNaNNaNNaNNaNYes

curated_md_report

study_namesample_idage_yearsage_minage_maxage_groupage_group_ontology_term_idbiomarkerbody_sitebody_site_ontology_term_idcountrycountry_ontology_term_iddietary_restrictionfeces_phenotype_metricfeces_phenotype_valuefeces_phenotype_metric_ontology_term_idfmt_rolefmt_idsexsex_ontology_term_idhlahla_ontology_term_idsmokersmoker_ontology_term_idcontrolcontrol_ontology_term_idtarget_conditiontarget_condition_ontology_term_iddiseasedisease_ontology_term_idantibiotics_current_usetreatmenttreatment_ontology_term_idtumor_staging_ajcctumor_staging_tnmunmetadatawesternized
21871ZhuF_2020wHAXPI034926-1522.022.022.0AdultNCIT:C49685Diastolic_Blood_Pressure_in_mm/Hg:70;Systolic_Blood_Pressure_in_mm/Hg:112fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNNon-smoker (finding)SNOMED:8392000Study ControlNCIT:C142703SchizophreniaNCIT:C3362HealthyNCIT:C115935noNaNNaNNaNNaNNaNYes
21872ZhuF_2020wHAXPI037144-819.019.019.0AdultNCIT:C49685Diastolic_Blood_Pressure_in_mm/Hg:74;Systolic_Blood_Pressure_in_mm/Hg:107fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNFemaleNCIT:C16576NaNNaNNon-smoker (finding)SNOMED:8392000CaseNCIT:C49152SchizophreniaNCIT:C3362SchizophreniaNCIT:C3362noNaNNaNNaNNaNNaNYes
21873ZhuF_2020wHAXPI037145-917.017.017.0AdolescentNCIT:C27954Diastolic_Blood_Pressure_in_mm/Hg:79;Systolic_Blood_Pressure_in_mm/Hg:137fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNNon-smoker (finding)SNOMED:8392000CaseNCIT:C49152SchizophreniaNCIT:C3362SchizophreniaNCIT:C3362noNaNNaNNaNNaNNaNYes
21874ZhuF_2020wHAXPI037146-1120.020.020.0AdultNCIT:C49685Diastolic_Blood_Pressure_in_mm/Hg:80;Systolic_Blood_Pressure_in_mm/Hg:120fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNFemaleNCIT:C16576NaNNaNNon-smoker (finding)SNOMED:8392000CaseNCIT:C49152SchizophreniaNCIT:C3362SchizophreniaNCIT:C3362noNaNNaNNaNNaNNaNYes
21875ZhuF_2020wHAXPI037147-1217.017.017.0AdolescentNCIT:C27954Diastolic_Blood_Pressure_in_mm/Hg:85;Systolic_Blood_Pressure_in_mm/Hg:115fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNFemaleNCIT:C16576NaNNaNNon-smoker (finding)SNOMED:8392000CaseNCIT:C49152SchizophreniaNCIT:C3362SchizophreniaNCIT:C3362noNaNNaNNaNNaNNaNYes
21876ZhuF_2020wHAXPI043592-837.037.037.0AdultNCIT:C49685Diastolic_Blood_Pressure_in_mm/Hg:81;Systolic_Blood_Pressure_in_mm/Hg:120fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNFemaleNCIT:C16576NaNNaNNon-smoker (finding)SNOMED:8392000CaseNCIT:C49152SchizophreniaNCIT:C3362Schizophrenia;Schizophrenia,repeatedNCIT:C3362;EUPATH:0001011noNaNNaNNaNNaNNaNYes
21877ZhuF_2020wHAXPI043593-940.040.040.0AdultNCIT:C49685Diastolic_Blood_Pressure_in_mm/Hg:78;Systolic_Blood_Pressure_in_mm/Hg:117fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNSmoker (finding)SNOMED:77176002CaseNCIT:C49152SchizophreniaNCIT:C3362Schizophrenia;Schizophrenia,repeatedNCIT:C3362;EUPATH:0001011noNaNNaNNaNNaNNaNYes
21878ZhuF_2020wHAXPI043594-1125.025.025.0AdultNCIT:C49685Diastolic_Blood_Pressure_in_mm/Hg:83;Systolic_Blood_Pressure_in_mm/Hg:125fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNMaleNCIT:C20197NaNNaNSmoker (finding)SNOMED:77176002CaseNCIT:C49152SchizophreniaNCIT:C3362Schizophrenia;Schizophrenia,repeatedNCIT:C3362;EUPATH:0001011noNaNNaNNaNNaNNaNYes
21879ZhuF_2020wHAXPI047830-1139.039.039.0AdultNCIT:C49685Diastolic_Blood_Pressure_in_mm/Hg:80;Systolic_Blood_Pressure_in_mm/Hg:120fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNFemaleNCIT:C16576NaNNaNNon-smoker (finding)SNOMED:8392000CaseNCIT:C49152SchizophreniaNCIT:C3362Schizophrenia;Schizophrenia,repeatedNCIT:C3362;EUPATH:0001011noNaNNaNNaNNaNNaNYes
21880ZhuF_2020wHAXPI048670-9038.038.038.0AdultNCIT:C49685Diastolic_Blood_Pressure_in_mm/Hg:80;Systolic_Blood_Pressure_in_mm/Hg:120fecesUBERON:0001988ChinaNCIT:C16428NaNNaNNaNNaNNaNNaNFemaleNCIT:C16576NaNNaNNon-smoker (finding)SNOMED:8392000CaseNCIT:C49152SchizophreniaNCIT:C3362Schizophrenia;Schizophrenia,repeatedNCIT:C3362;EUPATH:0001011noNaNNaNNaNNaNNaNYes

concatenated_md_report

study_namesample_idage_yearsage_minage_maxage_groupbiomarkerbody_sitecountrydietary_restrictionfeces_phenotype_metricfeces_phenotype_valuefmt_idsexsmokercontroltarget_conditiondiseaseantibiotics_current_usetreatmenttumor_staging_ajcctumor_staging_tnmunmetadatawesternized
22578YassourM_2018G102213NaNNaNNaNAdultNaNfecesFijiNaNNaNNaNNaNFemaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes
22579YassourM_2018G104686NaNNaNNaNInfantNaNfecesFijiNaNNaNNaNNaNMaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes
22580YassourM_2018G102217NaNNaNNaNInfantNaNfecesFijiNaNNaNNaNNaNMaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes
22581YassourM_2018G102218NaNNaNNaNInfantNaNfecesFijiNaNNaNNaNNaNMaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes
22582YassourM_2018G102211NaNNaNNaNAdultNaNfecesFijiNaNNaNNaNNaNFemaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes
22583YassourM_2018G102212NaNNaNNaNAdultNaNfecesFijiNaNNaNNaNNaNFemaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes
22584YassourM_2018G104681NaNNaNNaNAdultNaNfecesFijiNaNNaNNaNNaNFemaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes
22585YassourM_2018G102214NaNNaNNaNInfantNaNfecesFijiNaNNaNNaNNaNMaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes
22586YassourM_2018G102215NaNNaNNaNInfantNaNfecesFijiNaNNaNNaNNaNMaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes
22587YassourM_2018G102216NaNNaNNaNInfantNaNfecesFijiNaNNaNNaNNaNMaleNaNNaNSchizophreniaHealthyNaNNaNNaNNaNNaNYes

Duplicate rows

curated_md_report

study_namesample_idage_yearsage_minage_maxage_groupage_group_ontology_term_idbiomarkerbody_sitebody_site_ontology_term_idcountrycountry_ontology_term_iddietary_restrictionfeces_phenotype_metricfeces_phenotype_valuefeces_phenotype_metric_ontology_term_idfmt_rolefmt_idsexsex_ontology_term_idhlahla_ontology_term_idsmokersmoker_ontology_term_idcontrolcontrol_ontology_term_idtarget_conditiontarget_condition_ontology_term_iddiseasedisease_ontology_term_idantibiotics_current_usetreatmenttreatment_ontology_term_idtumor_staging_ajcctumor_staging_tnmunmetadatawesternized# duplicates
Dataset does not contain duplicate rows.

concatenated_md_report

study_namesample_idage_yearsage_minage_maxage_groupbiomarkerbody_sitecountrydietary_restrictionfeces_phenotype_metricfeces_phenotype_valuefmt_idsexsmokercontroltarget_conditiondiseaseantibiotics_current_usetreatmenttumor_staging_ajcctumor_staging_tnmunmetadatawesternized# duplicates
Dataset does not contain duplicate rows.